Artificial intelligence (AI), algorithms, data and GDPR What applies to artificial intelligence data?

Much in the field of artificial intelligence (AI) is still legally unclear. Of particular interest at present is what applies with regard to the data used to train an artificial intelligence and what applies legally with regard to the fully trained artificial intelligence in which the training data is contained in some form (e.g. the algorithm).

With regard to the data, a distinction must first be made between the training data and the result data. Training data in the above sense is the data with which the AI is trained. Result data in the above sense is the fully trained AI with its trained dataset. Depending on the exact design of the AI application, this data can be very different or sometimes not even exist at all.

Training data

Training data is therefore the data with which the AI is "fed". In the legal assessment, there is also an overlap with the area of "big data", which is a buzzword that is also not precisely defined in legal terms.

A further distinction must then be made as to what type of training data is involved:

Non-personal data

If no personal data is available, it must be checked in particular whether the use of the data violates contractual or statutory confidentiality obligations, e.g. because the data is subject to a contractual non-disclosure agreement (NDA) or because the data is subject to the confidentiality obligation of a doctor or other professional secrecy holder, for example. In addition, it may be necessary to check whether other contractual obligations exist with regard to the data, whereby such obligations may also arise implicitly as a contractual secondary obligation, e.g. if a service provider receives data from a customer as part of a service contract. A breach of the German Trade Secrets Protection Act (GeschGehG), which is even a criminal offence, due to unauthorized acquisition, disclosure or use of the data may also be considered. Furthermore, regulations from the ePrivacy Directive and the ePrivacy Regulation, which is currently (albeit haltingly) in the European standardization process, must be observed. According to e-privacy law, one of the decisive factors is how the data or information was obtained, in particular whether it was collected from a terminal device without authorization.

Further legal framework conditions with regard to mere data have already been discussed in various events of our Digitalization Law & Industry 4.0 Forum and are regularly discussed there in light of new laws and decisions.

Personal data

If personal data is involved, data protection law, in particular the GDPR, must be taken into account. A legal basis must then be found for processing the data for the purposes of AI training. A balancing of interests can be considered as a legal basis. However, this usually requires a consideration of the individual case of each data subject. For example, the result of the balancing of interests may be different for a child than for an adult. Furthermore, the balancing of interests cannot be considered as a legal basis if health data or other special categories of personal data are affected.

Consent is always a legal possibility. However, there are usually considerable problems with its practical implementation. On the one hand, every data subject must be sufficiently informed and made aware of what data processing is taking place. This is a particular problem with AI systems. In addition, consent under data protection law can be freely revoked at any time. This means considerable follow-up problems if the data has already been entered into the AI system and it must be clarified whether and to what extent data must be deleted because it is affected by the consent.

Another approach is to first abstract and anonymize personal data and only train the AI with this data. A variant of this method is to initially process personal data on the basis of consent, but to assume anonymization in the course of processing through AI training because the individual training data is no longer "recognizable" in the trained AI.

However, this raises the legal question of whether the process of anonymizing personal data is itself data processing that requires a legal basis under the GDPR. This question can and will be discussed in more detail in general legal terms and assessed by the courts. According to the current positions of the Federal Commissioner for Data Protection and Freedom of Information(BfDI) and the previous Article 29 Working Party, the question must be answered in the affirmative. The mere generation of anonymized data from personal data therefore constitutes data processing that requires a legal basis. This also applies if further processing is subsequently carried out exclusively with the anonymized data.

Results data in the fully trained AI

The assessment of the legal situation with regard to the result data, i.e. the data contained in the fully trained AI, must be considered very carefully from a technical perspective. Often, only the algorithm is referred to here. However, the term "AI" is currently used to refer to many different techniques and the term is used very loosely.

In each case, the extent to which the training data is contained in the fully trained AI must be evaluated. A whole spectrum is conceivable here:

  • At the left end of the spectrum is a simplest AI (which is not actually a "real AI"). In this AI, the training data is completely stored in a database and the AI accesses this database for future decision-making.

  • At the right end of the spectrum is an AI that has derived insights from training data and only has an abstract result stored (i.e. an algorithm).

In the former situation, i.e. at the left end of the spectrum, the training data is completely available in the fully trained AI. Therefore, the same legal framework conditions apply to the AI with regard to the data as with regard to the training data. In the latter situation, i.e. at the right end of the spectrum, the training data has been anonymized. The AI is therefore no longer (or hardly) subject to the legal framework that applied to the training data. There is a mixed situation between these two positions on the spectrum and it is important to take a closer look at which data is still available and in what form; this may include certain forms of "machine learning" in particular.

However, it should be emphasized that special laws on this aspect of AI do not yet exist, nor is there any established case law. The legal situation is still developing, so that in addition to determining the actual circumstances with regard to AI, an up-to-date assessment of the legal situation is required in each case. In particular, the extent to which the withdrawal of consent affects the fully trained AI is likely to be discussed more intensively. This is particularly the case if health data is involved or if the inner spheres of general personal rights (namely privacy and intimacy) are affected in any way.

Conclusion

The existing laws can be applied to data processed in the context of artificial intelligence, even if they are not specifically tailored to this. With regard to training data, the legal situation can still be assessed comparatively well with the existing laws. With regard to the data in the fully trained AI, however, it depends largely on the exact technical design of the AI and the form in which the training data is stored in it. The generally popular reference to the fact that the AI is a black box and that it is no longer possible to know how the AI has been trained and exactly what data is available is not expedient in this generalized form under the current legal situation. However, the assessment spectrum described above allows legal assessments to be made even in such a black box situation.

The above assessment of AI and data protection represents only one aspect of the legal situation surrounding artificial intelligence. Further questions also arise in data protection law, e.g. with regard to the legality of being evaluated as a human by an AI (see Art. 22 GDPR) or with regard to the question of how the necessary data protection information (see Art. 13, 14 GDPR) can be provided, how to deal with requests for information, whether data portability must be made possible and how a necessary data protection impact assessment (DPIA) should be carried out. It should also be taken into account that a fully trained AI (or the resulting algorithm) can be regarded as a trade secret about which a company does not want to provide information in order to maintain its competitive advantage and, in particular, does not want to disclose the AI (or the algorithm). We will be happy to address these and other questions separately, e.g. as part of our regular Digitization Law & Industry 4.0 Forum.

Date: 29. May 2020