A Converter for Full KeePass2 XML Export to CSV

A friend recently asked for a recommendation for a password manager,

Password Manager: A program that allows you to store all of your passwords in a single encrypted location, usually protected by one master password. Password managers can also help with generating passwords that are as random as possible, and with using random passwords (preferably one per website or account) without having to remember them all.
and I ended up recommending KeePass2, a free and open-source password manager that works across Linux, Mac OSX, and Windows. I recommended KeePass2 over a separate program called KeePassX because KeePassX’s stable version hasn’t been updated for several years, and its newer versions are still in the beta testing stage.

KeePass can import CSV files (which can be created from any spreadsheet program), and can record whatever columns one likes for each record (e.g., Username, Password, Email Address, Security Question Response, Phone Number, etc.). However, I disgruntedly realized after importing the full set of passwords, while KeePass2 can import user-defined columns from a CSV, it cannot export them. The two formats that KeePass2 does export with all user-defined columns (in addition to the basic Username, Password, and Notes) are .kdbx (the standard KeePass2 file format) and XML. This is by design, and is understandable, given how much information KeePass2 can store; regardless, it is an inconvenient decision for any user who wants to export to a spreadsheet (e.g., to try out another password manager in the future).

Thus, I wrote a script in python to convert a KeePass XML export file into a CSV (spreadsheet) file. This converter takes all of the columns present in the XML file and includes them in the CSV. It also takes the “History” section of each entry (in which KeePass2 stores all previous passwords) and puts them together in a column with the format “As of 2014-09-11, the password was ‘X’; As of 2014-09-13, the password was ‘Y’.”

I am releasing this under the MIT license, meaning that you can do what you want with it. Please do let me know if you create improvements that you would like to share!

If I update this in the future, I will post the repository to BitBucket or GitHub, and will add a link here.

##################################################
# Script for creating a CSV file from a KeePass2 XML Export file
# Jacob Levernier (adunumdatum.org)
# September 2014
# Distributed under the MIT License
##################################################

#########################
# SETTTINGS
#########################

data_file_to_parse="/path/to/Keepass2_File.xml"

#DO NOT put a trailing slash after this
output_directory="/path/to/output/directory"

name_of_output_file="output.csv"

#########################
# END SETTTINGS
#########################


import sys # For using exit() below if necessary.

# Attempt to load packages, and kill the script if they're not available. This approach follows, e.g., http://stackoverflow.com/a/1051266 (cf. the answer (though not the original question, or other answers) at http://stackoverflow.com/a/3388249):
try:
    from bs4 import BeautifulSoup
except ImportError, e:
    print "Sorry, you must install the 'bs4' (BeautifulSoup) package to run this script."
    sys.exit(1)

try:
    import csv
except ImportError, e:
    print "Sorry, you must install the 'csv' package to run this script."
    sys.exit(1)

try:
    import os
except ImportError, e:
    print "Sorry, you must install the 'os' package to run this script."
    sys.exit(1)

try:
    import shutil # For copying files, following, e.g., http://stackoverflow.com/a/123212
except ImportError, e:
    print "Sorry, you must install the 'shutil' package to run this script."
    sys.exit(1)


xml = open(data_file_to_parse, 'r')
code = xml.read()
xml.close()
soup = BeautifulSoup(code)
groups = soup.findChildren('group') # This splits the soup up by groups


# Move into the output directory:
if(not output_directory == ""):
    if(not os.path.isdir(output_directory)): # If the directory doesn't exist, we'll create it now:
        os.mkdir(output_directory)
    os.chdir(output_directory)
    
# Get a list of all keys used in the XML file (in the key-value structure). We're specifically looking for keys that are NOT part of 'history' sections, since those will go in their own column:

all_keys_in_file = soup.findChildren('key', recursive=True)
all_keys_in_file = [key.string for key in all_keys_in_file if not key.findParent('history')]
all_keys_in_file += ('History', 'Expiry Time', 'Expires (True/False)', 'Group Name') # Add a few addional values, and reduce everything to a set() (i.e., a set of unique values from the list)
all_keys_in_file = set(all_keys_in_file)

# Check if output file exists:
if(os.path.isfile(name_of_output_file)):
    print "Outputfile (",name_of_output_file,") already exists. To avoid overwriting it, we are copying the existing file to a new file with '_BACKUP_BEFORE_RUNNING_SCRIPT' appended to its name."
    shutil.copy(name_of_output_file, name_of_output_file+'_BACKUP_BEFORE_RUNNING_SCRIPT')

# If the file doesn't yet exist, create it, with a header row.
with open(name_of_output_file, 'w') as file_to_write: # 'a' for append
    row = csv.writer(file_to_write, delimiter=',')
    row.writerow(list(all_keys_in_file)) # Add a header row



# Go through the password groups, one by one, and parse entries within in:

for group in groups:
    group_name = group.findChild('name').string
    
    for entry in group.findChildren('entry', recursive=False):
        # The recursive flag, set to false, makes sure that we don't dive more than one level into the hierarchy (since there are other 'entry' nodes, e.g., within the 'history' node of each overarching entry.
        
        keys_and_values_for_this_entry = {} # (Re-)initialize a blank dict for this entry. We'll use this below.
        
        # Add the group to the row to write:
        keys_and_values_for_this_entry["Group Name"] = group_name
        
        keys_and_values_for_this_entry['History'] = "" # Initialize a blank value. We will use this below.
        
        entry_uuid = entry.uuid.string # We will, like the XML file, use this as an index/ID for this entry when we're lining up all of the columns and rows to write the CSV throughout the rest of this script.
        
        keys_and_values_for_this_entry["Expires (True/False)"] = entry.times.expires.string
        if keys_and_values_for_this_entry["Expires (True/False)"] == "True":
            keys_and_values_for_this_entry["Expiry Time"] = entry.times.expirytime.string
        else:
            keys_and_values_for_this_entry["Expiry Time"] = ""
        
        previous_key_history_entry_creation_time = "" # Initialize this (this isn't necessary for the code, but makes it more readable, I think. We'll use this below.
        
        for key_entry in entry.findChildren('key', recursive=True):
            # If the key is part of the "history" subsection of the entry, don't use it (we'll use it below, in that case):
            key = key_entry.string
            value = key_entry.findParent().findChild('value').string

            if not key_entry.findParent('history'):
                keys_and_values_for_this_entry[key] = value
                
                history_entry_creation_time = ""
                
            else:
                #print "IS HISTORY" # Good for debugging
                
                # If the key is part of the "history" subsection of the entry, we'll use it here:
                # Add to the history summary, making one big string, for each history entry within the overarching entry:
                current_key_history_entry_creation_time = key_entry.findParent('entry').findChild('lastmodificationtime').string
                
                #print "UUID IS",entry_uuid,"PREV IS",previous_key_history_entry_creation_time,"; CURR IS ",current_key_history_entry_creation_time # Good for debugging
                
                if not previous_key_history_entry_creation_time == current_key_history_entry_creation_time:
                    
                    if not previous_key_history_entry_creation_time == "":
                        # If this isn't the first history entry we're dealing with, and there is actually content in the previous_key_history_entry_creation_time object, we'll add a newline to separate the previous history entry from this one:
                        keys_and_values_for_this_entry['History'] += '\n'
                    
                    keys_and_values_for_this_entry['History'] += "As of "+current_key_history_entry_creation_time+", the entry was as follows: " # Since we are presumably at a new history entry (since we're at a new history_entry_creation_time), add an extra phrase to the summary of the entry's history.
                    previous_key_history_entry_creation_time = current_key_history_entry_creation_time
                    
                    keys_and_values_for_this_entry['History'] += "Expires (True/False): "+key_entry.findParent('entry').times.expires.string+"; "
                    if key_entry.findParent('entry').times.expires.string == "Y":
                        keys_and_values_for_this_entry['History'] += "Expiry Time: "+key_entry.findParent('entry').times.expirytime.string+"; "
                
                if not value == None:
                    keys_and_values_for_this_entry['History'] += key+": "+value+"; "
                else:
                    keys_and_values_for_this_entry['History'] += key+": "+"''; "               
        
        # We now have all of the key-value pairs for the entry, and a single-string summarizing all 'history' entries for the overarching entry. Finalize a single-string summary of the 'history' sub-entries, if there were any:
        
        combined_row_to_write = []
        
        for key in list(all_keys_in_file):
            if key in keys_and_values_for_this_entry:
                # If this entry DOES have a value corresponding to this key:
                
                if keys_and_values_for_this_entry[key] is None: 
                    # If this is None (i.e., an empty dict. value)...
                    combined_row_to_write.append("")
                else:
                    combined_row_to_write.append(keys_and_values_for_this_entry[key])
            else: 
                # If not, we will add a placeholder (so that the CSV file still lines up):
                combined_row_to_write.append("")
        
        with open(name_of_output_file, 'a') as file_to_write: # 'a' for append
            row = csv.writer(file_to_write, delimiter=',')
            row.writerow([x.encode('utf-8') for x in combined_row_to_write]) # This approach comes from http://stackoverflow.com/a/16957257
                
                

A final note: If you do end up using KeePass2 (or any other password manager) to import passwords from some other system, and have to store the unprotected CSV file on your hard drive temporarily, do see my note here.

Related

Next
Previous