Sunday, September 20, 2015

CrowdGrader can now check submission similarity

One of the problems of crowd-grading is that there is no-one who grades all submissions.  Thus, it is difficult to detect when students submit the same solution.

To help with the detection of similar submissions, we have implemented in CrowdGrader a similarity checker.  And not just any similarity checker: the most full-features similarity checker we could wish for.

The feature is accessible from an assignment page by selecting Submissions > Check Similarity.
Note that it is still in Beta - please report any problems.

Input formats

The CrowdGrader similarity checker can process:

  • Text typed directly in CrowdGrader
  • Attached Word (docx, not doc), PDF, HTML, RTF documents.
  • Attached source files in any programming language (it deals correctly with comments in C, C++, Java, Python).
  • zip, tar, tgz archives.  In the archives, you can specify which subset of files should be processed, so the similarity results are meaningful and not drowned out in a myriad of standard files. 
  • Compressed versions of above files via gzip.
CrowdGrader also accepts any nesting of the above, for instance, multiple Word files included in a single zip file are ok. 

Comparison output

CrowdGrader distinguishes between text that is: 
  • Unchanged: equal in the two submissions
  • Renamed: uniformly renamed (for instance, when a variable is renamed)
  • Different
CrowdGrader also clusters for you the similar submissions, according to a threshold of your choice.  
Perhaps the best is to look at a couple of screenshots.



Submissions are clustered according to their similarity.
You can dynamically vary the similarity threshold and explore the resulting clusters.

You can examine similar submissions side-by-side.
Identical content is highlighted in blue; content that has been renamed is highlighted in green.