To help with the detection of similar submissions, we have implemented in CrowdGrader a similarity checker. And not just any similarity checker: the most full-features similarity checker we could wish for.
The feature is accessible from an assignment page by selecting Submissions > Check Similarity.
Note that it is still in Beta - please report any problems.
Input formatsThe CrowdGrader similarity checker can process:
- Text typed directly in CrowdGrader
- Attached Word (docx, not doc), PDF, HTML, RTF documents.
- Attached source files in any programming language (it deals correctly with comments in C, C++, Java, Python).
- zip, tar, tgz archives. In the archives, you can specify which subset of files should be processed, so the similarity results are meaningful and not drowned out in a myriad of standard files.
- Compressed versions of above files via gzip.
CrowdGrader also accepts any nesting of the above, for instance, multiple Word files included in a single zip file are ok.
CrowdGrader distinguishes between text that is:
- Unchanged: equal in the two submissions
- Renamed: uniformly renamed (for instance, when a variable is renamed)
CrowdGrader also clusters for you the similar submissions, according to a threshold of your choice.
Perhaps the best is to look at a couple of screenshots.
|Submissions are clustered according to their similarity. |
You can dynamically vary the similarity threshold and explore the resulting clusters.
|You can examine similar submissions side-by-side.|
Identical content is highlighted in blue; content that has been renamed is highlighted in green.