A  A

AD·VNVM·DATVM Down to a single bit of data Introducing Markdown Mapper

Posted in || , , , , , , 14 min. to read

In short:

I've written a command-line program in R to facilitate reverse-engineering concept maps from plain-text notes (especially those written in Markdown).

I'm very excited to announce a project on which I've been working for the past several months. It's called Markdown Mapper, and is a command-line utility, written in R (see below) and open-sourced under the GPLv2 license, that reverse-engineers concept maps from plaintext notes.

This is a long post; thus, I've added a Table of Contents with links here:



Introduction / Example

Here are some plaintext notes. These text notes are preferably Markdown-based, though any text notes will do. They might be saved in a file called Example.markdown, or Example.txt, e.g.:

---
Title: An example file
Author: Jacob Levernier
Year: 2015
Other: You can add as much metadata as you want!
---

This text is written in Markdown style.

* You can have bullet-lists.
    * There can be sub-items.
    * This is a second sub-item.
* In addition to bullet-lists, you can have numbered lists.

1. This is the first item of a numbered list.
    1. Just as with bullet lists, there can be sub-items.
1. This is the second item of a numbered list.

There are a few additional features of this text that go beyond what Markdown normally offers.  
For example, you can add as many +{hashtags} to each line as you like.  
This line has two +{hashtags}, one of which is embedded in this sentence, and the other of which comes after it: +{example}

+{Hashtags} can be delimited by any marker you like (e.g., #hashtag, +hashtag, @hashtag, etc.). The default notation, with curly braces, is nice because it allows +{hashtags with spaces}.

The intended use for +{Markdown Mapper} is for +{notes} (e.g., class notes, or notes on +{academic articles}).

When you're reading +{academic articles}, you may come across a point that you want to remember especially well.  
-->You can mark a note as being of especial importance by surrounding it with arrows.<--

You might also want to note original ideas that you had while reading. You can note those by marking them like this: {{This is an original idea that I want to remember.}}

{{Original ideas}} can be combined with -->+{notes} of importance<-- as well as +{hashtags}.

Markdown Mapper goes through text files like this line-by-line and reverse-engineers a network map from them.

* For bullet lists, items are linked by parent/child status.
    * This is a "child" line; the line immediately above, with one less indentation, is the "parent."
        * You can indent as many times as you want.

You can also mark a line as being linked to the line above it, even if it's not in a list or connected through a hashtag.  
To do that, just use this symbol anywhere in the line: ^^^ 

Throughout this post, including in the example text above, I use the term "hashtag" (e.g., #hashtag) when I'm really writing more generally about tags (e.g., +tag, #tag, @tag). I've chosen to do that because I suspect that many readers will immediately understand references to hashtags, more so than they would references to "tags" more generally.

Markdown Mapper takes notes like those above, and turns them into maps like this:

""

The metadata from the top of the file is in the center of the map, connected to everything. For this example, let's take those nodes away for a moment, to make the map easier to read:

""

Features

Markdown Mapper has a growing list of features. Currently, these include the following:

The master list of all tags used in the given files is as follows:
                              Count
    +{hashtags}                 4
    +{academic articles}        2
    +{notes}                    2
    +{example}                  1
    +{hashtags with spaces}     1
    +{markdown mapper}          1
Processing the following dictionary of terms:
'+{hashtags with spaces}'   =>  '+{hashtags}' 

The master list of all tags used in the given files is as follows:
                     Count
+{hashtags}              5
+{academic articles}     2
+{notes}                 2
+{example}               1
+{markdown mapper}       1

Background

"Why," you might be asking, "Would you want to create a network map from text notes?"

When I was an undergraduate, I was fortunate to study for a year at Blackfriars Hall in Oxford, UK. The University of Oxford follows a "tutorial" system, in which each student meets once per week with a "tutor" (a don or other expert in the field). The meetings are traditionally one-to-one. Over the previous week, the student will have prepared a 2,000-3,000 word essay following a few-sentences prompt and substantial reading list from the tutor. The student then reads the essay aloud, and then, for the remainder of the hour, orally defends the essay under questions from the tutor. The process then repeats, typically with two tutorials (each on a different subject) per week.

The high volume of reading and writing (1-2 essays per week) required by this system encouraged me to systematize my note-taking. For each essay, I had to find a way to read and annotate a variety of sources for ~4 days, compose and refine a thesis statement and outline on the 5th day, and then be able to write the essay on days 5-6. I experimented with taking notes in the margins of photocopies or scans, but found that it was too difficult to find passages without a substantial delay. Adding labeled sticky notes to each page helped this, but made the notes for a source ephemeral, especially if I was annotating a book from a library. Eventually, I started experimenting with hand-drawing concept maps, which have two primary benefits:

  1. They're non-linear. Sections of a source can easily refer back to other sections.
  2. They're spatial. Especially when taking the time to hand-write annotations, I found that I developed a spatial sense of where a point had been raised and documented, allowing me to somewhat intuitively look for passages in certain sections of the page immediately, decreasing time spent looking for specific points or references. This spatial quality also allowed me to view all of my notes on a source at a single glance, facilitating decisions about which sections could support an argument, and which could not.

Each node of the network would start with a page or section number. Through use over weeks, I found that maps worked best in my workflow when they had three "levels:" a level for normal notes; a level for notes of especial importance; and a level for my own original ideas, written down as they were prompted by my reading. Toward the end of each week, I could (ideally) rapidly decide on a thesis statement for an essay by reading over all of the original ideas that I had recorded while reading, and then fleshing out the most promising two or three.

""

Hand-written maps worked well, because they took more time than typing (facilitating thinking as I wrote), and were easily portable (no laptop — with the distractions that come from having an active wireless connection — required). When preparing to write an essay, I would create a detailed outline, and then, next to each node, write (in the example above, in red) all outline sections for which the node would be relevant. This allowed writing each section by looking at a glance for relevant references. This approach was very time-consuming, but only because it front-loaded the bibliographic work; the writing process was comparatively quick.

Eventually, I became concerned that these maps were not searchable, and so began looking for a digital solution. Having surveyed many available concept mapping software packages, I began using VUE ("Visual Understanding Environment"), a free project from Tufts University. These maps looked like this, using the same three-level approach (regular notes, "nota bene" notes, and original thoughts) as the handwritten notes above: Although I've not used it extensively, Docear looks especially useful for an academic workflow, as well.

""

VUE is a very useful and featureful program. Eventually, however, I became hesitant to take notes using it, because a) the approach is often more time-consuming than taking traditional linear notes, and b) it's hard to use the data elsewhere. VUE can export to XML, and similar programs can sometimes export to plain text, but the format is in both cases often either difficult to read or difficult to easily re-use.

I've thus been thinking about how best to take notes for the last four years — a method that allows both linear and non-linear annotating, is plain-text-based (to allow greatest reuse and to minimize storage format lock-in), and doesn't require storage in a non-human-readable format like XML. I now think I'm a step closer to having a satisfying answer. Notes can be kept linearly in plain-text files. They can be linked to original source files (e.g., PDFs) with tools such as Zotero. And they can be linked together using in-line metadata.

Existing Tools

Several tools get close to this idea, and I am grateful for having read about them as I planned an approach for the Markdown Mapper script below. Brett Terpstra created a script for System Services and PopClip on OSX to allow converting an outline into markdown-style plaintext. This text can then be pasted into an application such as MindNode, which will create a hierarchical concept map from it. To my knowledge, this approach does not allow automatic non-linear linking between parts of the outline, but would be useful for many workflows. Cf. here. Mr. Terpstra has also written about using MindMeister, another software package for concept mapping, to create mindmap-based presentations. ConnectedText, which is freely available for Windows, looks like a very helpful program for auto-generating mindmaps from text notes, and introduced a set of suggested conventions for linking sections of plain-text notes. Although I didn't use these conventions when writing Markdown Mapper, favoring a more flexible "hashtag" approach to creating links, it was encouraging to see this implementation.

Markdown Mapper

How to use it

Currently, Markdown Mapper is written in R (see below), and designed to be run with the Rscript command (which is installed as part of R, and allows R scripts to be run from the command line). Because it is just an R script, it should work across all platforms. Do note, however, that it currently has only been tested on Linux, and does use an X11 window to generate a pop-up quick-view graph. To run the script you need to do a few things:

  1. Download and install R. Due to library dependencies, you'll need R version ≥ 3.1.0.
  2. Download and install Python v2.7+ (according to the R argparse GitHub page, Python v3.2+ does work, as well), including the argparse and json packages (which I think come installed by default in many Python distributions). Note that Python may already be installed on your system if you use Linux or Mac OSX.
  3. Download the Markdown Mapper script, "Markdown_Note_Grapher.R". The most up-to-date version can be found here on GitHub.
  4. Open a terminal. Run the script with Rscript /path/to/Markdown_Note_Grapher.R path/to/the_text_file_you_want_to_map.txt. You could also create a shortcut to this command using, e.g., a bash 'Alias' file. I added a line to my .alias file so that I can summon Markdown Mapper with the command concept-map.

The first time you run the script, it will likely install several R packages. Some of these can take quite a while to install. After that first time, however, the script runs much more quickly. Do note that if you're installing R for the first time, it may be necessary on some systems to run R once from the command line and install a package (any package) manually, using, e.g., install.packages('argparse'). This will allow R to prompt you if it needs to create a new directory for installing packages in the future — lacking this one-time prompt, the Markdown Mapper script might fail with an error message.

When using Markdown Mapper, you can list as many text files as you want. By default, the script will output the master tag list after looking across all of the files you gave to it, then will print a quick-view network graph to allow the user to see, at a glance, the network structure of the notes. This graph can be saved as a PDF with the --quick-view-graph-name flag For the example file above, that would look like this:

""

I recommend looking through the Help documentation, which describes all of the available flags, using Rscript /path/to/Markdown_Note_Grapher.R --help. Do note, though, that the documentation is a work in progress.

The script should take care of installing any R packages that it needs. The point here is that you don't need to know R (or Python) to run this script. Why did I write it in R? Really, it's just that it was a good practice project. R is a good choice if I want to add any sort of statistical analysis options in the future. Having said that, I'm considering porting the code to Python, as well, if it would facilitate freezing the code into standalone binary executables for Linux, OSX, and Windows, for easier distribution. With a standalone file, users wouldn't have to install anything. They would just download the file and run it from the command line. I'm thinking of using pyInstaller, but welcome suggestions for other approaches, as this will also be a training project. My hesitation is that igraph's plotting feature requires a separate install of the cairo graphics library, which might defeat the original purpose of the port. For now, regardless, this R script works just fine for its intended purpose.

Plain text notes

Markdown Mapper uses plaintext notes, which makes it especially powerful. This power derives from the fact that keeping notes in plain text opens them up to all of the utilities of the *nix command line. If, e.g., I wanted to search the text file above for lines that contain the phrase '+{hashtags}', and then map only those lines, I could do it, in one command:

grep --ignore-case "+{hashtags}" example1.txt | concept-map --suppress-file-names grep is a command for searching over text. --ignore-case makes the command match '+{hashtags}' as well as '+{Hashtags}'. As mentioned above, I've set my computer up to call Markdown Mapper using the command concept-map. The pipe ('|') directs the output of grep into concept-map, which is reading input from stdin, the Unix designation for things that are entered into a program through the terminal.

Here's the output:

The master list of all tags used in the given files is as follows:
                              Count
    +{hashtags}                 4
    +{example}                  1
    +{hashtags with spaces}     1
    +{notes}                    1
    Generating quick-view network graph...

""

One could also run this command over a whole directory of files. It's really powerful stuff.

Beyond concept mapping, storing files in plain text allows immediate transformations. If one wanted to view the example text above without the tag markers ('+{}'), this command would do it:

perl -p -e "s/\+{(.*?)}/\1/g" example1.txt

A selection of the output from that command looks like this (note that the +{} wrappers are gone from 'academic articles,' 'hashtags', etc.):

When you're reading academic articles, you may come across a point that you want to remember especially well.  
-->You can mark a note as being of especial importance by surrounding it with arrows.<--

You might also want to note original ideas that you had while reading. You can note those by marking them like this: {{This is an original idea that I want to remember.}}

{{Original ideas}} can be combined with -->notes of importance<-- as well as hashtags.

How it works

Markdown Mapper treats each line / paragraph of text as a node in the network. It goes through the text, file-by-file and then line-by-line, making inferences about which lines are related to which other lines by looking at tags and text structure (e.g., with the indentation of list-items). As it goes, it creates an edge list, a table with three columns: From, To, and Relationship, where the From and To columns are lines of text from the input file(s), and the Relationship column is the relationship between them (e.g., Parent, List item, Tag, Contains Type of Thought, etc.). It then uses the qgraph package to draw a quick-view graph from the edge list, and the igraph package to create an adjacency matrix (a tablular version of an edge list, where the row and column names are lines of text from the input file(s), and each cell has a 1 wherever the lines have a relationship), if the user has asked for one.

As it goes along, Markdown Mapper also hard-wraps the text Hard-wrapping = inserting a line break. every 20 or so characters, in order to make the text nodes more rectangular (this keeps each node from displaying as one long line of text).

Considerations

Markdown Mapper is a tool for one's academic toolbox. It's not designed for everything. Since it treats each line of text as a node in the network, it isn't designed for text files that have big code chunks, for example. UPDATE 2015-04-12: As of v0.1.2, Markdown Mapper can create single nodes out of multi-line blocks of text.

The main Markdown elements to which Markdown Mapper pays attention are lists. Features like **bold** text, *italicized* text, and # Headers just get re-printed in plain-text form. Headers aren't currently treated as special, meaning that they sometimes become throwaway nodes in the network, disconnected from the text below them. I use Markdown enough that seeing text occasionally printed *like this* doesn't distract me. I also don't mind losing some of the information that headers contain. However, if you have a better idea for handling these, I'd be happy to hear it, and to collaborate on changes.

Summary

Markdown Mapper is an open-source script, written in R and meant to be run from the command line (using RScript), that reverse-engineers concept maps from plaintext notes. I think that it's built from a powerful idea and goes beyond what other existing tools offer, either in non-linear flexibility or in being open. My next step is possibly to port the project to Python to increase ease of distribution.

Because logos can be useful, I've adapted this Markdown logo, which was created and dedicated to the Public Domain by Dustin Curtis (cf. the logo's GitHub repository): ""

From it, I've created this new mark, not only for Markdown Mapper, but also for Markdown concept mapping more generally:

[Markdown Mapper mark]

If you like, feel free to download the Inkscape SVG file. I'm releasing it under a CC-BY license, meaning that it can be used for commercial purposes, adapted, etc., with attribution.