Algorithms, apps & artificial intelligence 2: Can data protection laws be used to challenge discriminatory tech

This is the second article from Cloisters’ Robin Allen QC and Dee Masters examining discriminatory technology.

Summary

In our first piece, which we initially published in November 2017, we explored the interplay between technology and the Equality Act 2010 (‘EA 2010’) concluding that algorithms, apps and artificial intelligence (‘AI’) have the potential to give rise to claims for direct discrimination, indirect discrimination, harassment and a failure to make reasonable adjustments.  We proposed various ways in which companies that deploy technology could protect themselves from litigation.

Here, we will examine a new angle to this debate which is the possibility of using data protection laws, specifically the General Data Protection Regulation (‘GDPR’) and the Data Protection Act 2018 (‘DPA 2018’), to stop and expose biased algorithms, machine learning and the tainted data which can give rise to breaches of the EA 2010. 

We conclude that for the most part these new legislative initiatives may not be as effective as might be expected when when it comes to tackling discriminatory algorithms and machine learning since they do not expressly create a meaningful requirement for transparency in relation to the algorithms themselves thereby allowing discriminatory technology to remain hidden.  Ironically, potential claimants may find that the lack of transparency itself takes centre stage in any litigation relying on case law such as C-109/88 Danfoss to argue that the burden of proof under the EA 2010 shifts onto organisations who use algorithms and machine learning to demonstrate that they are not discriminatory.

Fortunately, the DPA 2018 and the GPDR are more helpful in relation to the data sets used by algorithms and as part of machine learning since the data subject has a right to access personal data that is being processed about them.   This may allow potential claimants to understand if discriminatory data sets are being utilised which could in turn be used to bring claims under the EA 2010. 

Part A:  Identifying the problem – biased algorithms, machine learning & tainted data

At the heart of artificial intelligence is the algorithm.  Algorithms are a set of steps created by programmers.  They usually perform repetitive and tedious tasks in lieu of human actors.  For example, when LinkedIn informs a user that someone within her network is also connected to five people who are her contacts, it is an algorithm – and not a human – that has quickly compared the two networks to find common contacts.

Algorithms are, of course, code written by humans for human purposes, and algorithms can discriminate on the grounds of protected characteristics when they become tainted by the unconscious assumptions and attitudes of their creators or where they process data which is tainted by discrimination. 

The power of an algorithm is often linked to machine learning which is a means of refining algorithms and making them more “intelligent”.  Here is an extract from “The privacy pro’s guide to explainability in machine learning”[1] published by the International Association of Privacy Professionals (IAPP)[2] which explains more –


What is machine learning? Machine learning is a technique that allows algorithms to extract correlations from data with minimal supervision. The goals of machine learning can be quite varied, but they often involve trying to maximize the accuracy of an algorithm’s prediction. In machine learning parlance, a particular algorithm is often called a “model,” and these models take data as input and output a particular prediction. For example, the input data could be a customer’s shopping history and the output could be products that customer is likely to buy in the future. The model makes accurate predictions by attempting to change its internal parameters — the various ways it combines the input data — to maximize its predictive accuracy. These models may have relatively few parameters, or they may have millions that interact in complex, unanticipated ways. As computing power has increased over the last few decades, data scientists have discovered new ways to quickly train these models. As a result, the number — and power — of complex models with thousands or millions of parameters has vastly increased. These types of models are becoming easier to use, even for non-data scientists, and as a result, they might be coming to an organization near you. 


The dangers of machine learning is illustrated by  US research[3] from 2016 which revealed that when searching for a “black-identifying name”, a user was more likely to be shown personalised ads falsely suggesting that the person might have been arrested than in comparison to a “white-identifying name”.  The following examples are taken from the paper –

Ads generated when searching for a “black-identifying name”:

(i) Ads generated when searching for a “white-identifying name”:

(ii) The Fourth Report[4] of the House of Commons, Science and Technology Committee also highlights the terrifying possibility that machine learning could mean that humans stop being in control of algorithms as machine learning takes over –


Transparency would be more of a challenge, however, where the algorithm is driven by machine learning rather than fixed computer coding.  Dr Pavel Klimov of the Law Society’s Technology and the Law Group explained that, in a machine learning environment, the problem with such algorithms is that “humans may no longer be in control of what decision is taken, and may not even know or understand why a wrong decision has been taken, because we are losing sight of the transparency of the process from the beginning to the end”.  Rebecca MacKinnon from think-tank New America has warned that “algorithms driven by machine learning quickly become opaque even to their creators, who no longer understand the logic being followed”.  Transparency is important, but particularly so when critical consequences are at stake.  As the Upturn and Omidyar Network have put it, where “governments use algorithms to screen immigrants and allocate social services, it is vital that we know how to interrogate and hold these systems accountable”. Liberty stressed the importance of transparency for those algorithmic decisions which “engage the rights and liberties of individuals”  (footnotes removed)


The dangers of discriminatory data, which can then be used as part of the machine learning process to “learn” discrimination, was also highlighted by the House of Commons, Science and Technology Committee.  One example which supported their concerns related to facial recognition technology.  This is becoming increasingly prevalent in the criminal justice system.  However, as identified in the Fifth Report[5] of the House of Commons, Science and Technology Committee –


As we noted in our recent report on Algorithms,[6] research at MIT in the US found that widely used facial-recognition algorithms were biased because they had been ‘trained’ predominantly on images of white faces. The systems examined correctly identified the gender of white men 99% of the time, but the error rate rose for people with darker skin, reaching 35% for black women.


The research to which they were referring was carried out by Joy Buolamwini and Timnit Gebru[7] and it demonstrates powerfully the pitfalls of using discriminatory data.  The Abstract published at the head of this research states –


Recent studies demonstrate that machine learning algorithms can discriminate based on classes like race and gender. In this work, we present an approach toevaluate bias present in automated facial analysis algorithms and datasets with respect to phenotypic subgroups. [We found that currently widely used] datasets are overwhelmingly composed of lighter-skinned subjects … and introduce a new facial analysis dataset which is balanced by gender and skin type. We evaluate 3 commercial gender classification systems using our dataset and show that darker-skinned females are the most misclassified group (with error rates of up to 34.7%). The maximum error rate for lighter-skinned males is 0.8%. The substantial disparities in the accuracy of classifying darker females, lighter females, darker males, and lighter males in gender classification systems require urgent attention if commercial companies are to build genuinely fair, transparent and accountable facial analysis algorithms.


The authors then went about trying to cure the bias by creating a new data set which was based on male and female faces from a range of Parliaments with impressive levels of gender parity from around the world.  This created a more balanced representation of both gender and racial diversity.  Their paper identified the range they used pictorially[8]

  In other words, it is possible to create better technology by eliminating discriminatory machine learning.

Part B:  Using data protection laws to stop discrimination

There is now no doubt that algorithms can create discrimination.  The European Advisory Body on Data Protection and Privacy set up a Working Party under Article 29 of the GDPR (‘WP29’)[9] to provide “Guidelines on Automated Individual decision – making and Profiling for the purposes of Regulation 2016/679”[10] – i.e. the GDPR.  The WP29 Guidelines state the problem clearly –


Profiling and automated decision-making can be useful for individuals and organisations, delivering benefits such as: · increased efficiencies; and · resource savings. They have many commercial applications, for example, they can be used to better segment markets and tailor services and products to align with individual needs. Medicine, education, healthcare and transportation can also all benefit from these processes. However, profiling and automated decision-making can pose significant risks for individuals’ rights and freedoms which require appropriate safeguards. These processes can be opaque. Individuals might not know that they are being profiled or understand what is involved. Profiling can perpetuate existing stereotypes and social segregation. It can also lock a person into a specific category and restrict them to their suggested preferences. This can undermine their freedom to choose, for example, certain products or services such as books, music or newsfeeds. In some cases, profiling can lead to inaccurate predictions. In other cases it can lead to denial of services and goods and unjustified discrimination.


We now consider whether data protection legislation, specifically the GDPR and the DPA 2018 (which enshrines the GDPR in the UK) can help challenge this type of technology.

Prohibition on algorithms and machine learning under Article 22 of the GDPR

The DPA 2018 and the GDPR applies to the “processing” of “personal data”.  The definition of “processing” is sufficiently broad to cover the application of algorithms to personal data –


‘Processing’ means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as … use  (GDPR, Article 4 (2))


The DPA 2018 and the GDPR also cover “profiling” which engages directly with machine learning –

‘Profiling’ means any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict aspects concerning the natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements (GDPR, Article 4(4)).


The ICO guidance on automated decision-making and profiling[11] confirms that machine learning and algorithms fall within the GPDR.  It offers various examples which, depending on the mechanics and data used, could well create discriminatory outcomes –


Example Profiling is used in some medical treatments, by applying machine learning to predict patients’ health or the likelihood of a treatment being successful for a particular patient based on certain characteristics.

Automated decision-making is the process of making a decision by automated means without any human involvement.  These decisions can be based on factual data, as well as on digitally created profiles or inferred data.  Examples of this include: an online decision to award a loan; andan aptitude test used for recruitment which uses pre-programmed algorithms and criteria


Importantly, under the DPA 2018 and Article 22 of the GDPR, a data subject cannot be subject to decisions made in consequence of the pure application of an algorithm (whether or not underpinned by machine learning) where there are legal consequences for him or her or similarly significant repercussions.   

The precise language of the GDPR is as follows –  


The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her (Article 22 (1)).


The ICO guidance[12] states that a decision which discriminates against a data subject would have a similarly significant repercussion to fall into Article 22.

The ICO guidance[13] provides the following example of an algorithm which is prohibited under Article 22 –


Example: As part of their recruitment process, an organisation decides to interview certain people based entirely on the results achieved in an online aptitude test.  This decision has a significant effect, since it determines whether or not someone can be considered for the job.


But, the prohibition in Article 22 is heavily qualified since –

(i) It does not apply to decisions where there is any degree of human intervention.  For example, an employer who deployed an algorithm to monitor attendance[14] would not fall under Article 22 if a manager still made the final decision whether or not to initiate capability proceedings.

(ii) Further, Article 22 does not apply where the processing is necessary for the performance or entering into a contract between the data subject and a data controller (for example, an employment contract).  Please note that the data controller does not need to be party to or a potential party to the contract with the data subject, it could be a third party data controller.  Accordingly, the ICO is clear in its guidance[15] that a financial organisation could rely on an automatically generated credit score carried out by a third party credit reference agency to decide whether or not to enter into a contract with the data subject.  This is a very significant limitation on Article 22.

(iii) Finally, Article 22 does not arise where the processing is authorised by European or national law provided that there are suitable safeguarding measures in place (Article 22 (2)). This is again a very wide expectation.  This is illustrated by an example outlined by the ICO[16].  It explains that an organisation in the financial services sector using automated decision making to detect fraud could justify that under Article 22 (2) pursuant to high level regulatory requirements to detect and prevent crime.

In those circumstances, we suspect that Article 22 will not apply to a significant number of algorithms.  Challenging algorithms which are not prohibited by Article 22 will need to be done under the remaining provisions of the DPA 2018 and GDPR (as further explored below) or via the EA 2010. 

We add two further comments on the utility of Article 22.   In WP29 Guidelines there is detailed consideration of Article 22.  The Guidelines recognise the extent to which Article 22 is limited as we have set out above and then add[17]


Any processing likely to result in a high risk to data subjects requires the controller to carry out a Data Protection Impact Assessment (DPIA).[18] As well as addressing any other risks connected with the processing, a DPIA can be particularly useful for controllers who are unsure whether their proposed activities will fall within the Article 22(1) definition, and, if allowed by an identified exception, what safeguarding measures must be applied.


We would certainly endorse this and encourage Data Controllers to undertake a DPIA whenever there is the potential for a high risk.

Objecting to discriminatory algorithms and machine learning under Article 21 of the GDPR

On the assumption that an algorithm is not prohibited ay Article 22, there is a limited right afforded to data subjects to object to the use of algorithms and machine learning under the DPA 2018 and the GDPR –


The data subject shall have the right to object, on grounds relating to his or her particular situation, at any time to processing of personal data concerning him which is based on point (e) or (f) of Article 6 (1), including profiling based on those provisions.  The controller shall no longer process the personal data unless the controller demonstrates compelling legitimate grounds for the processing which override the interests, rights and freedoms of the data subject or for the establishment, exercise or defence of legal claims (GDPR, Article 21 (1)).


The right is further limited because it will only arise where the lawful basis for using algorithms and machine learning is either that –

Processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller (GDPR, Article 6(1)(e)) or

(i) Processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data, in particular where the data subject is a child (GDPR, Article 6(1)(f)).

(ii) In other words, a data subject cannot object where the data controller is using an algorithm or machine learning where the processing is necessary for the performance of a contract to which the data subject is a party (perhaps an employment contract), or to comply with a legal obligation to which the controller or subject or to protect the vital interests of a data subject or another natural person (GDPR, Article 6(1)(b), (c), (d)).   The breadth of this limitation is discussed at paragraph 22 above.

Whilst we recognise that the recital 71 to the GDPR stresses that profiling should be conducted in a way which prevents discrimination, we suspect that the many limitations on Article 21 will mean that it may not always provide a fruitful means to challenge discriminatory technology in many cases.

Part C:  Accessing evidence to demonstrate discrimination

So, it appears that the data protection legislation may not be such an effective tool for stopping discriminatory technology. It might be, as we explained in our first piece, that the EA 2010 is a better means of challenging algorithms.  Whilst there are many documented examples of discriminatory technology, a good deal of these incidences have been exposed due to painstaking and no doubt expensive research.  By way of example, journalists at Propublica had to analyse 7,000 “risk scores” in the US to identify that a machine learning tool deployed in some states was nearly twice as likely to falsely predict that black defendants would be criminals in the future in comparison to white defendants.[19]  Most claimants will not have access to this level of resource.   In the rest of this article, we look at the extent to which the GDPR and the DPA 2018 could be used to access material which could, at least, support challenges to discriminatory algorithms and the machine learning which underpins them, as well as exposing tainted data sets.

Exposing algorithms & machine learning using the principle of transparency in the GDPR

At first blush, invoking the principle of transparency created by the DPA 2018 and the GDPR in relation to algorithms and machine learning looks promising.  In theory, forcing creators and buyers of algorithms to be transparent should create an opportunity to scrutinize technology and identify discrimination.

In broad terms, the relevant provisions are as follows –

(i) “Personal data shall be … (a) processed lawfully, fairly and in a transparent manner in relation to the data subject” (GDPR, Article 5(1)).

(ii) When personal data is collated, there is a duty to inform the data subject in “a concise, transparent, intelligible and easily accessible form” (GDPR, Article 12 (1))) “the purpose of the processing” (GDPR, Article 12(1)(c)) and “the existence of any automated decision-making, including profiling … and, at least in those cases [i.e. profiling cases], meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject” (GDPR, Article 13(2)(f)).

The sections highlighted above initially appear promising.  However, the GDPR does not go so far as to say that the algorithm or basis for the machine learning must be disclosed.  Indeed, the ICO guidance on automated decision-making and profiling suggests that the principle of transparency is fairly weak –


How can we explain complicated processes in a way that people will understand?
Providing ‘meaningful information about the logic’ and ‘the significance and envisaged consequences’ of a process doesn’t mean you have to confuse people with over-complex explanations of algorithms. You should focus on describing: the type of information you collect or use in creating the profile or making the automated decision;why this information is relevant; andwhat the likely impact is going to be / how it’s likely to affect them.  
Example
An on-line retailer uses automated processes to decide whether or not to offer credit terms for purchases.  Thes processes use information about previous purchase history with the same retailer and information held by the credit reference agencies, to provide a credit score for an online buyer. The retailer explains that the buyer’s past behaviour and account transaction indicates the most appropriate payment mechanism for the individual and the retailer. Depending upon the score customers may be offered credit terms or have to pay upfront for their purchases
.


If the principle of transparency enshrined with the GDPR means simply that organisations are under an obligation to provide rather superficial and trite explanations, it is highly unlikely that they will give rise to meaningful scrutiny of technology.  Certainly it seems unlikely that an organisation would provide sufficient information so as to allow a potential claimant to demonstrate that a particular algorithm was discriminatory.

Using an inability to explain to shift the burden of proof

Ironically, it may be that the lack of transparency in relation to technology takes centre stage.  Discrimination practitioners will be very familiar with the line of European authorities, such as C-109/88 Danfoss, which establish that a lack of transparency in a pay system can give rise to prima facie discrimination.  The principle would equally translate to challenges to discriminatory technology.  If, as some commentators have suggested (see paragraph 9 above), it is not possible to explain how an algorithm is operating, there is a real risk of a successful discrimination claim as the user of the technology will not be able to provide a non-discriminatory explanation for the treatment of the claimant.

Exposing tainted data under Article 15 of the GDPR

Fortunately, the DPA 2018 and the GPDR is more helpful in relation to the data used by algorithms and as part of machine learning.  Specifically, the data subject has a right to be told if personal data is being processed and if so, have access to that data and the categories of personal data concerned (GDPR, Article 15).  This may allow potential claimants to understand if information concerning protected characteristics are being used by an algorithm or as part of machine learning, for example, race or gender.  Similarly, data subjects may be able to see if indirect discrimination is occurring if data is being used which is linked to particular protected characteristics, for example, part-time working.  Inevitably, group litigation where a number of claimants pooled resources and shared personal data might well be even more effective at demonstrating that data sets are discriminatory.

Conclusion

Discriminatory technology and the march of AI is now a hot topic.  We predict that discrimination lawyers will be increasingly asked to advise on using the EA 2010 as a means of challenging automated decision making.  The DPA 2018 and the GDPR is limited in scope but the ability to access data sets may well assist claimants formulate and pursue claims against employers and service providers.

As well as the ICO guidance practitioners should look at the updates to guidance given by WP29 (which ceased to operate from the 25 May 2018) and the European Data Protection Board (EDPB) which subsequently came into operation. The website of the EDPB can be consulted from the following address: https://edpb.europa.eu/


[1] With thanks to John Higgins CBE, previously Director – General of Digital Europe for suggesting this site.

[2] https://iapp.org/news/a/the-privacy-pros-guide-to-explainability-in-machine-learning/

[3] “Discrimination in online ad delivery”, Latanya Sweeney, Harvard University, 2016:  https://dataprivacylab.org/projects/onlineads/1071-1.pdf

[4] The House of Commons Science and Technology Committee, Report on Algorithms in decision making, the Fourth Report of Session 2017–19, is available here  https://publications.parliament.uk/pa/cm201719/cmselect/cmsctech/351/351.pdf

[5]https://publications.parliament.uk/pa/cm201719/cmselect/cmsctech/800/80003.htm

[6] See Science and Technology Committee, The use of algorithms in decision-making, Fourth Report of Session 2017–19, HC 351.

[7] See “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification” in the Proceedings of Machine Learning Research 81:1–15, 2018 Conference on Fairness, Accountability, and Transparency;  this is available at http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf

[8] Ibid. figure 1.

[9] Known as the “Article 29 Data Protection Working Party” or “WP29”. 

[10] These Guidelines were adopted on the 3rd October 2017 and most recently revised and adopted on the 6th February 2018. Seehttp://ec.europa.eu/newsroom/article29/item-detail.cfm?item_id=612053

[11] https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/automated-decision-making-and-profiling/

[12] See fn 10.

[13] See fn 10.

[14] You may also be aware of systems using forms of machine learning to decide on dismissal decisions.  The so called Bradford Formula or Factor Developed by the Bradford University School of Management in the 1980s for calculating the significance of staff absence has been around for some time and is now being sold by many providers as a personnel management tool. 

[15] See fn 10.

[16] See fn 10.

[17] See ibid. supra [IV] at p.20.

[18] Article 29 Data Protection Working Party Guidelines on Data Protection Impact Assessment (DPIA) and determining whether processing is “likely to result in a high risk” for the purposes of Regulation 2016/679. 4 April 2017. European Commission.  http://ec.europa.eu/newsroom/document.cfm?doc_id=44137 Accessed 24 April 2017.

[19] https://www.nytimes.com/2016/08/01/opinion/make-algorithms-accountable.html