Slides and speaking notes from a Digital Library Federation Forum Panel on Data Governance and Ethics

Earlier this week, I was part of a panel discussion of CLIR Fellows at the 2017 DLF Forum conference.

The presentation’s abstract is below:

Surveying the landscape of data governance and ethics

Jacob Levernier(1), John Borghi(2), Jacqueline Quinless(3), M. Scott Thompson(4)
1: University of Pennsylvania, United States of America; 2: California Digital Library, United States of America; 3: University of Victoria, Canada; 4: CLIR Postdoctoral Fellow, United States of America

The purpose of this panel is to describe the current data governance landscape, identify tools useful for shaping features of that landscape, and consider how individuals in these areas may collaborate with each other and others, including in industry, as data sharing and re-use become more systematized, commonplace, and incentivized.


My slides are available here in odp format (which should be able to be opened in LibreOffice as well as PowerPoint.

I’ve included the slides in this format because the final slide contains an animated GIF image that does not render in PDFs. Cf. the full panel’s slide deck in PDF format.

Speaking notes

My speaking notes are reprinted below. This was a 6-7 minute brief introduction; each panelist agreed to reduce our initial speaking time in order to leave more time at the end of the session for discussion.

  • (Slide 1)
    I’m going to talk briefly about the University of Pennsylvania’s libraries as a case study. We’re still awaiting final University IT (and possibly General Counsel) approval for this new Library Privacy Policy, but it has gone through the Library’s Administrative Council.
  • I’d like to start by acknowledging that I’m not trying to bring up new rhetorical points – rather, my goal is to talk through the difficult, muddy conversations that are involved with actually implementing some of the Platonic-esque ideals that privacy advocates such as myself, and as, I suspect, many of you here, hold dear.
  • I’m specifically speaking today about ways to conceptualize and communicate with users about the uses of their data in a way that feels substantive – the evolution of my thought process on these issues, and how we’ve been working to address them at UPenn this year.
  • In my mind, libraries are essentially academic utilities – lacking illegal means like SciHub, students and faculty can’t really choose not use the library.
    • And in my mind, that means that those of us in libraries and similar institutions have an extra ethical imperative to be thinking about these issues – stewarding a tradition that’s existed since at least the McCarthyism era of US history.
  • I say this to establish some of the values that I suspect we all here share.
  • My background is in Psychology – specifically in the psychology of values.
    • (Slide 2)
      And so I came into my position at UPenn used to thinking about trust relationships around privacy as often built on shared values. In this mindset, the paramount privacy concern in my mind was limiting the questions we ask along principled lines – and, from that, simply not keeping data as much as possible, lacking a specific reason. This was a harder-line, fairly rigid approach.
      • And I still see this as accurate – but as incomplete.
        • The question then became, “If users do substantively, informedly consent to the use of their data, and users truly feel they can refuse to (or withdraw their) consent, is the research then ok?” (Even if it’s a question we would normally not be keeping data to answer?) And from that, “Can users consent when the risks are so counterintuitive? What does communication that would enable this understanding look like, if it’s even possible?”
        • Principles are often slow to develop. But technology changes rapidly.
          • So how do we, simply by using digital management systems, or allowing our users to bookmark items in our catalog, or just having logs on our servers, not inadvertently (let alone willfully) undermine our traditional privacy protection principles?
    • (Slide 3)
      I then started thinking about trust relationships between providers and patrons as being highly context-dependent, regardless of shared values.
      • That is, what people see as private in one circumstance, they may not in another. But the issue is with datasets being able to hop contexts more easily than ever.
      • In this view, gathering data isn’t bad in itself. Asking questions in itself isn’t necessarily the problem. Rather, feelings of lack of control around the potential uses of the data are part of what elicit feelings of breach of trust.
  • (Slide 4)
    And then I started to think about the fact that the work of gaining users’ trust and actually keeping their data safe are two really separate (even though overlapping) issues.
    • It’s hard to make an implementation good enough to be dangerous with many of these things, especially with cutting-edge ideas like “differential privacy.” And it’s even harder to make them good enough to be safe.
    • So this in-progress example from UPenn’s libraries, in the context of this panel, is a place to talk through trying to implement methods by which research questions can still be asked, but in a way that is rigorously ethical as well as radically transparent.
  • (Slide 5)
    Library data present many of the same ethical questions as do social media data.
    • The data are different in their content, but not really in their form or in many of the things they can leak about users, both as individuals and as groups.
    • So, we started looking at how social media companies, which are often spoken about as being willfully obtuse about data use under a pretense of transparency, communicate with their users.
    • Facebook, for example, has adopted a format of cutting their Privacy Policy into bite-sized chunks, arranged in the format of an FAQ.
      • (Slide 6)
        Google has taken a similar approach with their Privacy Policy.
      • But it can be difficult to understand how one element of the policy interacts with another element, or to see changes over time.
    • (Slide 7)
      LinkedIn, as another example, has taken the approach of adding annotations to their Privacy Policy.
  • (Slide 8)
    We settled on adding four aspects to our Libraries’ Privacy Policy to enable users, as a community (even if not yet as individuals) to communicate back with us in a substantively informed way about research projects we’d like to do, and about the data that we have (and don’t have). This involves:
    1. Creating a public website that will contain a versioned history of all changes to the Privacy Policy going forward, with publicly-archived comments from a multi-week public comment period before any changes go into effect.
    2. (Slide 9)
      For any research project we do with user-generated data, publishing (and allowing comments on) a research data ethics management plan, which involves answering a series of prompts about what specific threats we foresee, and what we are doing to mitigate those (including noting for how long the data will be kept, and by whom specifically). (These are just a few of those prompts.)
    3. Performing an annual “data census” across all of the University’s libraries, and publishing a summary of that census.
    4. (Slide 10 [animated])
      Incorporating an FAQ into our new Privacy Policy in what I think is a new way: The Policy text will react to users clicking on FAQ links, showing snippets of the document that are relevant to a given question, while retaining the context of the document.
      • This also works with screen readers across platforms.
      • This way, our new policy, which is almost four times as long as our current Policy, even though it doesn’t get into legalese or loopholes, can hopefully be understood at whatever level a user wants to engage with it, whether at a glance or in-depth.
  • (Slide 11 [blank])
    • These conversations and implementations don’t have to be all at once – they can be iterative, as with our new Privacy Policy.
      • This is a big step for us, but a first step. And especially with the European Union’s General Data Protection Regulation (GDPR) coming, this is a way we’ve developed for laying a solid base for communicating with our users, with the expectation that we’ll add additional nuance over time.
    • So, that’s where I’d like to start our discussion.