An example screencast showing the “Document Contextual Emphasis” approach.

An Introduction to "Document Contextual Emphasis"

This post largely re-prints the Readme for the Document Contextual Emphasis, which I authored and then released with the University of Pennsylvania under an MIT License. This post is released under AdUnumDatum’s default license terms, with that provenance and that MIT license’s expectations in mind.

As part of my work in the University of Pennsylvania Libraries, I have developed a markup-based approach for embedding a searchable Frequently Asked Questions (FAQ) list into a document, such as a privacy policy. I have posted a fork-able GitHub repository template for creating a web page for a document that has an embedded FAQ in it, and automatically publishing that page through GitHub Pages.

For an interactive example of this approach, see here.

Motivation

Documents such as privacy policies can be dense to read and difficult to understand. Users often do not read these types of documents, at least in part because of documents’ length and impenetrable wording.

Other approaches

Several approaches exist for facilitating user understanding of long or technical documents.

Facebook’s Data Policy, for example, features a Table of Contents, the top-level items for which are phrased as questions (for example, “What kinds of information do we collect?”). Bullet points below each question in the Table of Contents provide a summary of that section’s content. While this approach does make the document easier to read than presenting a large block of text, in this approach, sections are conceptually separated from one another: this approach communicates that each section of the document is not relevant to other sections of the document. Put differently, in this approach, each section / question is assumed not to overlap with any others. Facebook’s Data Policy page also does not facilitate users viewing changes in the Policy over time.

While not for a Privacy Policy, Creative Commons has taken an approach of producing a completely separate summary document for each of its technical license texts. This approach seems useful for quickly communicating to users, but likely discourages users from reading the block of legal text of the actual license (which many users would likely be unable to substantively comprehend).

“Document Contextual Emphasis”

The example repository introduces an additional approach for facilitating reader comprehension of technical or otherwise dense documents: “document contextual emphasis.” In this approach, a Frequently Asked Questions list is embedded within a document. When a question from the list is clicked, the snippets of the document relevant to that question – from entire sections to individual words or phrases – become highlighted. In this approach, sections of the document can be conceptualized as relevant to multiple questions: the same section of a text could be relevant to a question about sharing data with law enforcement officials, a question about sharing data more generally, and a question about what information about users is retained, for example.

This approach, as implemented here, carries several benefits:

• It encourages users to read the actual policy text, by highlighting sections that are relevant to the user’s interests and needs.
• It combines questions phrased in non-technical language with technical document text, providing, in the interface itself, a layer of translation to facilitate comprehension of the document. Even if a user cannot comprehend the document text, seeing a section of the text highlighted in response to a non-technically-worded question relevant to the user could allow the user to ask for assistance from others in a more targeted way than would otherwise be likely.
• It provides original context: Users can see the context of all highlighted portions of text. Non-highlighted portions of the document can be copied and pasted alongside highlighted portions.
• Document text is straightforward to update. As implemented here, documents are written in Markdown, a syntax for marking up documents that is designed to be easier to read than HTML, and is quick to learn (likely in 20-30 minutes of reading).
• Change history is preserved. Because documents are written in Markdown, their history of changes can be saved using Git, the technology on which GitHub is built.
• The display is screen-reader friendly: Users accessing a document rendered with this approach using a screen reader will receive an explanatory overview of the page in several places. Further, thanks to consultation from Kate Lynch, screen readers across platforms will voice highlighted phrases when a question is clicked.

Markup approach

Markup in this approach extends Markdownwith a new FAQ tagging markup, using the syntax <#tag>...</#tag>.

Unlike in XML/HTML, tags are allowed to overlap, even with themselves. Overlapping tags (i.e., <#a>Lorem<#a> ipsum dolor</#a> sit amet</#a>) are read by the system as

<#a(1st instance)>Lorem<#a(2nd instance)> ipsum dolor</#a(1st instance)> sit amet</#a(2nd instance)>

rather than as in standard HTML, which would parse as

<#a(1st instance)>Lorem<#a(2nd instance)> ipsum dolor</#a(2nd instance)> sit amet</#a(1st instance)>

This allows increased flexibility when tagging a document, especially over time or by non-developers.

Thus, in this approach, markdown like this:

<#scope-of-policy> This is Markdown.
<#law-enforcement>

<#law-enforcement> A list of items: </#law-enforcement>

- <#types-of-data> List item 1 </#types-of-data>
- List item 2, </#law-enforcement> with additional content, </#scope-of-policy> and more.


becomes:

<span data-relevant-to-question="scope-of-policy" markdown=1> This is Markdown.</span><!-- scope-of-policy -->
<span data-relevant-to-question="scope-of-policy" markdown=1><span data-relevant-to-question="law-enforcement" markdown=1></span><!-- scope-of-policy --></span><!-- law-enforcement -->

### <span data-relevant-to-question="law-enforcement" markdown=1><span data-relevant-to-question="scope-of-policy" markdown=1>A third-level heading.</span><!-- scope-of-policy --></span><!-- law-enforcement -->

<span data-relevant-to-question="law-enforcement" markdown=1><span data-relevant-to-question="scope-of-policy" markdown=1><span data-relevant-to-question="law-enforcement" markdown=1> A list of items: </span><!-- law-enforcement --></span><!-- scope-of-policy --></span><!-- law-enforcement -->

- <span data-relevant-to-question="law-enforcement" markdown=1><span data-relevant-to-question="scope-of-policy" markdown=1><span data-relevant-to-question="types-of-data" markdown=1> List item 1 </span><!-- types-of-data --></span><!-- scope-of-policy --></span><!-- law-enforcement -->
- <span data-relevant-to-question="law-enforcement" markdown=1><span data-relevant-to-question="scope-of-policy" markdown=1>List item 2, </span><!-- law-enforcement --> with additional content, </span><!-- scope-of-policy --> and more.
- List item 3, <span data-relevant-to-question="partial-line" markdown=1> with a </span><!-- partial-line --> [link](https://google.com).


This rendered HTML is somewhat cumbersome to read; however, it is straightforward to read, as each closing </span> tag is given a comment explaining which tag it closes. Further, the HTML is derived from an original markdown document, which is easier to read and maintain!

Conclusion

I am excited about the User Interface possibilities that this approach and Proof-of-Concept present. I welcome feedback on both!