Automating Quiz Grading (for Qualtrics Data) with R

While teaching a Lab section of a Research Methods in Psychology course this summer, my fellow Lab instructors and I created a beginning-of-term quiz on the first week’s assignment.

Incidentally, the assignment was to watch an introductory screencast on the PsycINFO/PsycNET research database, which you are welcome to view here, as well, if you like.
We distributed this quiz through Qualtrics, a web-based platform that is excellent for gathering data but limited for analyzing them. Thus, I wrote a short R script to score a CSV spreadsheet of quiz data, which might look like this:

Student_ID Q1 Q2 Q3 Q4
55652125 3 5 Apples Bananas
113545621 2 5 Apples Oranges
44511235 3 2 Peaches Bananas
555123321 3 3 Apples Oranges

“Q1” here means “Question 1 response.” The columns can be named anything, though.

Although the quiz for which I wrote this code comprised only multiple choice questions, the code below would work for any type of questions for which there is a definitely correct answer.

# Grading Script for Multiple Choice Questions
# Jacob Levernier
# June 2014
# For the Public Domain (CC0 License)

# This script requires that the sqldf() package be installed. You can install it from within R with `install.packages('sqldf')`.

#################################
# SETTINGS -- EDIT THESE
#################################

# Set the working directory. Do NOT put a trailing slash ('/') after this:
setwd("/path/to/directory/of/dataset")

# Set the data file (this assumes that it's within the working directory set above). The file should be a CSV, with comma-separation between columns, and text delimited with a double-quotation mark ("). The file should also have a header row (i.e., the first row should be column names):
data_file_name <- "Full_Class_Quiz_Data.csv"

# Set the output CSV file that you want to create (this will be created in the working directory set above). This can be the same as data_file_name above, if you just want to update that file:
output_csv_file_name <- "PsycNet_Questions_Data_Filtered_and_Scored_for_Just_My_Lab.csv"

# List of student ID numbers to pull out of the data file (this assumes that the data file is for an entire course, and not just the specific lab or lecture section that you're teaching):
# This should be a list of comma-separated ID numbers. "c()" here means "concatenate."
list_of_student_ID_numbers <- c(55652125, 113545621, 44511235, 555123321) # (These are example ID numbers; to confirm, I'm not leaking information about my own students here.

# The name of the column in the dataset that contains each student's Student ID number:
student_ID_column_name_in_dataset <- "Student_ID"

# Each answer here should be named the same as the column in the dataset to which it refers:
# '"Q1" = 2' here means "The answer to Q1 is '2'."
answer_key <- list(
	"Q1" = 3,
	"Q2" = 5,
	"Q3" = "Apples",
	"Q4" = "Oranges"
)

#################################
# END SETTINGS
#################################



#################################
# SCRIPT STEPS -- DO NOT EDIT THESE
#################################

# Load the sqldf library:
library('sqldf')

# Load the data:
loaded_data <- read.csv(data_file_name, header=TRUE, sep=",", quote="\"", dec=".")

# Get the data just for our lab, using SQL syntax (as if we were querying a database):
subset_of_data_to_analyze <- sqldf(paste("
SELECT * 
FROM loaded_data 
WHERE", student_ID_column_name_in_dataset, 
"IN (", paste(list_of_student_ID_numbers, collapse=","), ");")
)

###
# Grade the responses, item by item:
###

subset_of_data_to_analyze[['Student_Summed_Score']] <- 0 # Give this column an initial value. It will be added to below.

subset_of_data_to_analyze[['Total_Points_Possible_According_to_Scoring_Process']] <- 0 # Give this column an initial value. It will be added to below.

for (answer_key_entry in 1:length(answer_key)) {
	# If the student got the answer correct, give a score of 1 for that answer. lapply() here calculates this for all students at once, rather than going through a student-by-student loop:
	question_name <- names(answer_key[answer_key_entry])
	subset_of_data_to_analyze[[paste(question_name,"_scored", collapse="")]] <- sapply(
			subset_of_data_to_analyze[[question_name]], 
			function(x) if (x == answer_key[[question_name]]) {1} else {0}
			# If statement help here was from http://stackoverflow.com/a/13112387/1940466
		)
	
	# Add the score of the current question to the student's current total score:
	subset_of_data_to_analyze[['Student_Summed_Score']] <- subset_of_data_to_analyze[['Student_Summed_Score']] + subset_of_data_to_analyze[[paste(question_name,"_scored", collapse="")]]
	
	# Add 1 to a column that essentially calculates the total possible number of points. This is what the percentage grades below are based on, and also can serve as a check that the actual sum of points is the same as what you meant for there to be.
	subset_of_data_to_analyze[['Total_Points_Possible_According_to_Scoring_Process']] <- subset_of_data_to_analyze[['Total_Points_Possible_According_to_Scoring_Process']] + 1
}

# Give a percentage grade for each student:
subset_of_data_to_analyze[['Student_Percentage_Grade']] <- subset_of_data_to_analyze[['Student_Summed_Score']] / subset_of_data_to_analyze[['Total_Points_Possible_According_to_Scoring_Process']]

# Write a CSV file with the results:
write.table(subset_of_data_to_analyze, file=output_csv_file_name, append = FALSE, quote = TRUE, sep = ",", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE)

#################################
# END SCRIPT STEPS
#################################

Given the data above, the script would produce output like this:

Student_ID Q1 Q2 Q3 Q4 Student_Summed_Score Total_Points_Possible_According_to_Scoring_Process Q1 _scored Q2 _scored Q3 _scored Q4 _scored Student_Percentage_Grade
55652125 3 5 Apples Bananas 3 4 1 1 1 0 0.75
113545621 2 5 Apples Oranges 3 4 0 1 1 1 0.75
44511235 3 2 Peaches Bananas 1 4 1 0 0 0 0.25
555123321 3 3 Apples Oranges 3 4 1 0 1 1 0.75

Although systems such as Qualtrics sometimes are able to do basic scoring such as this, I like that this script is modular and separate from the data-gathering mechanism (in this case, Qualtrics). It would not be difficult to adapt this code to individually weight questions, or to use a more complicated scoring mechanism. For basic quizzes, though, it works as-is.

Related

Next
Previous