Content note: In talking about the role of consent in data analysis, I briefly consider the definition of consent as it relates to physical relationships in the USA.
This week, I was part of a panel discussion of CLIR Fellows at the 2017 DLF Forum conference. I'm noting here a question to which I alluded during my talk (the text for which is linked above), and which I discussed with several other conference participants afterward. It's a question I've been struggling with for months: Is it ethically possible for individuals who cannot be reasonably expected to understand data mining to consent to the use of their data? Or, put from a different viewpoint, is it ethical for data analysts to ask for that consent?
The question originally came to my mind based on this chain of logic:
In other cultural domains (beyond data analysis) in the USA, a person is said to only be able to consent to the extent that that person has the capacity to understand the implications of that consent.
For example, RAINN (The Rape, Abuse & Incest National Network) states the following about consent as it relates to physical relationships:
In general, there are three main ways that states analyze consent... [The third of these is] Capacity to consent: Did the individual have the capacity, or legal ability, to consent?... A person’s capacity, or ability, to legally consent to sexual activity can be based on a number of factors, which often vary from state to state. [These include] age[,]... developmental disability[,]... intoxication[,]... unconsciousness[,]... [and status as a] vulnerable adult.
The Belmont Report, I quote extensively from the Belmont Report in this post. Beyond quoting for Fair Use principles, the Report does seem to be in the public domain in the USA. a document that governs research that involves human participants, is consistently introduced to all researchers (across training programs, including to those with a psychology background, such as myself) who work with human participants. Professionally, I think, anyone who has received training that involves the Belmont Report, and certainly those with doctoral-level training, remain lastingly ethically bound to its principles, even when doing research that does not qualify under the USA's federal definition of "Human Subjects" research. At least anyone who has obtained an advanced degree in the social sciences knows better, ethically, than to ignore these principles, even if review by an ethics board is not legally required, as in some data mining projects.
The Belmont Report states the following about informed consent on the part of human research participants (emphasis added):
- Respect for persons requires that subjects, to the degree that they are capable, be given the opportunity to choose what shall or shall not happen to them.... There is widespread agreement that the consent process can be analyzed as containing three elements: information, comprehension and voluntariness....
- A simple listing of items does not answer the question of what the standard should be for judging how much and what sort of information should be provided....
- While there is always an obligation to ascertain that the information about risk to subjects is complete and adequately comprehended, when the risks are more serious, that obligation increases....
- Special provision may need to be made when comprehension is severely limited -- for example, by conditions of immaturity or mental disability. Each class of subjects that one might consider as incompetent (e.g., infants and young children, mentally disable [sic] patients, the terminally ill and the comatose) should be considered on its own terms.... [See the end of this post for more on the sentence that comes after this section.]
- Voluntariness. An agreement to participate in research constitutes a valid consent only if voluntarily given.... Undue influence, by contrast, occurs through an offer of an excessive, unwarranted, inappropriate or improper reward or other overture in order to obtain compliance. Also, inducements that would ordinarily be acceptable may become undue influences if the subject is especially vulnerable.
Regarding what kinds of "harm" are possible from research (the lack of each of which, it seems to me, could possibly influence an individual's willingness to consent to the use of their data), For example, might a person fear experiencing a major social cost for not consenting to data use by a service provider such as Facebook? If so, is that cost large enough to warrant examining? (I'm not sure what the answer to either question is.) the Belmont Report states,
There are, for example, risks of psychological harm, physical harm, legal harm, social harm and economic harm and the corresponding benefits. While the most likely types of harms to research subjects are those of psychological or physical pain or injury, other possible kinds should not be overlooked.
It is often very difficult to understand the threat vectors around data storage and usage. This is both the case for those of us who professionally consider and attempt to mitigate risks, See, for example, the Netflix Prize incident (cf. the original paper by Narayanan and Shmatikov and their FAQ about the paper), and the 2014 re-identification of an ostensibly-anonymized public dataset of New York City taxi data. and for individuals. Lay individuals misunderstand the implications of ways of conceptualizing privacy, as Solove argued in 2007, and misunderstand the technical mechanisms by which privacy breaches can happen, including through public discourse: President Barack Obama's statement in 2013 that "no one is listening to your calls," for example, promoted a (reasonable) misunderstanding among the public that data mining occurs through the use of individual analysts manually looking over data. While manual analysis is, of course, a part of any work with data, my point here is that, for example, if a person has never heard of a relational database, it is likely not immediately understandable how (or even that) datasets can be joined to each other, years after they are collected, etc., even at a scale at which manual analysis is infeasible.
- If the risks of analysis on large-scale (either in scope, or in time, or in sensitivity) datasets are hard enough to understand that one needs years of training to even partially understand them, and the potential impacts of data disclosure are high, can people consent to the use of their data?
In the example of library data, for example, one low-probability, high-impact risk is of a dataset of search queries exposing sensitive information about a person's interests, orientations, or (e.g., politically-charged) research.
This seems to me like a major ethical problem, because it requires a "Yes" answer for some industries to continue to function.
Arguments that, yes, people can consent to data use, and that it is ethical for data analysts to ask:
The Belmont Report does state that "Even for... persons ["when comprehension is severely limited"], however, respect requires giving them the opportunity to choose to the extent they are able, whether or not to participate in research." This sentence answers that, yes, individuals can meaningfully consent to research that they cannot reasonably be expected to understand. It's also the case, in the domain of medicine, for example, that it's accepted that lay (i.e., non-medically expert) individuals can give consent for medical procedures. If that metaphor resonates, though, it's also necessary to note that in that domain, medical practitioners can be (and routinely are) sued for malpractice.
At the least, it seems to me that both the spirit and letter of the context of this last-quoted passage requires rethinking how information on these topics has traditionally been communicated to users (who, even if they do understand, may still feel undue pressure to comply).
I do my own work on the assumption that users must be able to consent (i.e., that it must be the case that users could conceivably consent), and that informed consent that is both "informed" and "consent" allows a given research project to be called ethically conducted. But even with that working assumption allowing ongoing work (albeit only under rigorous philosophical as well as technological controls), this question, whether users can consent to data mining, if they can't realistically be expected to substantively understand the risks, remains on my mind.
I have lingering, niggling, doubting emotions about this question, even if not yet a full-enough vocabulary to sort through them.