Known
0
Review later
0
Completed
0/20
After completing a prompt-tuning experiment, you notice that the model's accuracy in generating relevant responses is high, but the fluency and grammatical correctness of the outputs seem to be suboptimal. What statistical metric would most directly indicate this issue, and what action should you take to improve the output?