This past summer, I taught a fully online course. On June 1, I thought my grading would be AI-resistant. By August 30, I thought online grading was dead.
Why was I confident in June? After all, AI already had most teachers at all levels moving writing in-person. Blue book sales were booming. And I knew that AI could already do some of my longstanding assignments, like interpreting a randomized experiment study.
I was confident partly because, just months earlier, many of my assignments and tests seemed AI-proof. Most of all, I trusted my plan to give oral exams over zoom.
Technological progress showed my hubris on both counts.
I teach causal research methods and data analysis to Master of Public Administration (MPA) students, most working as practitioners. I’m passionate about this course, and some of the content and assignments are quite original (if I do say so myself). The summer version was online and asynchronous—no scheduled zoom classes.
What changed between early AI and last summer? In early AI, a few students produced sentence salad without the analysis taught—just bad work, however it was done. This summer, some students produced good analysis but used terms well beyond what I taught. In early AI, a few students produced diagrams that looked right superficially but were substantively wrong or irrelevant. This past summer, some students produced good diagrams that, when I hovered over their diagrams, a pop-up warned that AI-generated images might contain errors. But mostly, they did not contain errors.
Still, I wasn’t too worried about grading accurately because I planned to use oral exams. First would come a written online exam with open-ended, applied, multi-part questions. Then a 15-minute oral follow-up over zoom, where I’d ask students to explain a few of their written answers. If they couldn’t, I’d lower their final exam score—potentially to zero. It had worked the year before.
In many cases, it still worked. I saw the usual mix of mistakes within a variety of levels of success—and oral exams that mostly matched the written exams. As in years of teaching in-person, some students reported that studying for the exam helped them finally “get” the ideas.
In other cases, oral exams also worked when learning failed. Some students wrote good answers but couldn’t explain them. My syllabus let me mark down their entire exam. A learning failure, but an assessment success.
But then came the spectacular failures. Some students who didn’t understand—based on multiple independent indicators—still produced excellent written and oral exams. At first, I thought they had human help in real time. But then I checked: Cluely can do this. It feeds answers to both written and oral questions.
It didn’t happen that much. But it happened.
I felt shattered. I put a lot of myself into this course. I put a lot of time and energy into the oral exams—designing, scheduling, giving, rewatching videos, determining grades—and it still didn’t work. Thinking about the future of further AI improvement, fair and accurate grading seemed impossible.
This is not a story about my students. Many wanted to learn. Many did learn. Most did not cheat. It’s a story about higher education in general. AI abuse permeates higher education, including the most prestigious universities. A Columbia undergraduate invented Cluely, explicitly to help students cheat. Professional master’s students are not immune.
Maybe you think the real problem is online learning. True: humans can help students cheat too, and that’s long been a concern with “distance education.” But human help during an oral exam requires skilled labor at exactly the right time—limited and potentially expensive. AI is readily available and cheap.
Or maybe you think there are technical fixes. Software claims it can block Cluely. Ok for now. But this is an arms race. Tech will win, higher ed will lose.
Or maybe you think professors will solve this with more interaction and innovation. With hindsight, I could have constructed a far more AI-resistant oral exam. I should not have asked about only written questions they already saw. Unfortunately, every idea I’ve thought of or heard about is very labor-intensive. Have students give examples, then create variations and ask them questions on the variations, add more twists and turns… Too much work for wide use.
Or maybe you think it’s pointless to assess what AI can do. Why insist students do what they no longer need to do? We are far from knowing what skills people need in an AI world. Yet I believe that humans will still need to think, even for work and certainly for society. To do that, students must practice on simpler problems that AI can solve. But if we can’t test students on simpler problems, we won’t know—and they won’t know—if they learned to think.
In late August, a widely read opinion essay proposed oral exams as one way to “prevent lazy AI use” that undermines the “work necessary for learning.” Colleagues congratulated me for being ahead of the curve. But having just discovered that AI could hack my oral exams, I felt ahead only in realizing how hard real solutions are.
So where does that leave me, already committed to teaching my asynchronous course next summer? For now, I’m focusing on the students who want to learn. AI may have killed online grading, but it hasn’t killed online learning.
