Calibrated Human-in-the-Loop Short-Answer Grading
A fine-tuned language model grades student responses and emits a temperature-scaled confidence score. High-confidence predictions are auto-graded; low-confidence ones are flagged for human review. Attribution highlights the answer tokens that most influenced the grade.
Examples:
Loading model & running inference — this may take a minute on first request …
—
Predicted Grade
—
Confidence
Token Attribution
Gradient × Input — tokens most influential to this grade
LowHigh attribution
Model Feedback