Machine learning and data privacy

The deployment of machine learning tools across different sectors, be it for eligibility for a loan or scoring candidates during a recruitment process, is becoming more and more common. The main privacy issues relating to machine learning (ML) tools come from the collection of large amounts of information coupled with the tool’s ability to make autonomous decisions and actions aimed at maximising success.

As the GDPR comes into force tomorrow (25 May 2018), the recently published ‘GDPR Article 29 Working Party‘ guidelines in automated individual decision making and profiling is likely to turn any ML activities into hurdles.

Let’s start by quoting the GDPR itself: ‘data subjects shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her’. At first sight, this seems to read that only decisions that are based solely on automated processes trigger a right for the data subject to opt out. However, the ‘Article 29 Working Party‘ is quite clear and clarifies that there must be a meaningful human intervention for the above not to apply.

Deploying AI/ML tools

There are exemptions to the rules surrounding automated decisions within the regulation. These include fraud prevention or money laundering checks, whether the decision is necessary for the performance of, or entering into a contract, or if it is based on the individual’s prior consent. However, these exemptions must be interpreted extremely narrowly. For example, the word necessary could be misleading – what it means is that prior to proceeding, a way of achieving a less intrusive impact on privacy is required. The performance of a contact exemption does not apply to special categories of data such as health and therefore, it is likely that consent will need to be obtained.

Furthermore, the GDPR specifies that organisations deploying AI/ML tools are to provide a meaningful explanation underpinning the logic used. This is a softer provision compared to the initial versions of the GDPR but it will still prove extremely important. In practice, this means that privacy notices cannot simply cover all tools used but have to include meaningful information about how a decision is reached, what data is used and the relevance of that data.

Privacy risks

Unfortunately, it is becoming clearer that biased datasets are a huge problem for machine learning. In fact, anything described in text, an image or in voice requires information processing, and this will be influenced by cultural, gender or race biases. It is not possible for algorithms to remain immune from the human values of their creator. Due to this, the accountability of algorithms will be key for the future and there will be a renewed focus developing a diverse workforce. In fact, the need for diverse development teams and truly representational data-sets to avoid biases is a key recommendation issued by the House of Lords in their recent report – ‘Artificial Intelligence Committee AI in the UK: ready, willing and able’.(

Ultimately, the two most important principles of the GDPR are transparency – which defines the right of data subjects to exercise control over their personal data (article 12), and accountability – which establishes the data controllers’ duty towards the data subject (article 24). Both principles will need to be applied to the AI/ML world as data subjects have a right to know the logic underpinning the automated decisions made about them.

In order to achieve all of the above, privacy by design must become a key feature. Privacy Impact Assessments will assess how the application meets the principles of the GDPR and will inform practices to minimise privacy risks. In broad terms, mitigating actions could include: reducing the amount of training data that needs to be used (and there a number of ways to achieve this, such as Generic Adversarial Networks); protecting privacy without reducing the data basis, and this is where cryptology could offer some interesting possibilities; and identifying possible ways to avoid the black box issue.

Written by Ivana Bartoletti, Head of Privacy and Data Protection, Gemserv