Creative, AI-proof, out-of-class assessments are possible—but take a lot of work

As students rely increasingly on AI, faculty are increasingly making graded assessments in person. But much higher education is now online—over half of undergraduates according to one estimate. Even for in-person courses, moving assessment in-person reduces instructional time and isn’t feasible for every assessment type.

What are the non-in-person options for accurate grading?

In a recent op-ed, an NYU Vice-Provost advocated oral exams. While he envisions them being in person, they can be done over Zoom. Unfortunately, AI can feed students answers during online oral exams. In my case, oral exams sometimes failed—at least when students knew the potential questions in advance. Something more is needed. Maybe individualized, surprise questions would be enough—or maybe not.

What other twists on oral exams—or entirely different approaches—might work? And what about skills for which oral exams aren’t suitable? Options will likely vary by subject, level, and even specific skills within a course. Here are some approaches I’ve thought of or heard about.

Data analysis. Students record a screen-capture video as they conduct their analysis, explaining what they are doing, why, and how they interpret the results.

Coding. Students write code in advance and then explain why they coded that way. The examiner could then change the task or ask the student to code a different approach.

Political science, history, communications… Two teams debate in real time. One presents an argument while the other poses questions. They then switch roles. Evaluation is based on the quality of both their questions and answers.

Math. Give students a problem (just for them) to solve while explaining and justifying their reasoning aloud.

Many subjects. In an online written test, insert short oral-response questions at random times that must be answered within a limited window.

Economics. Have students choose and analyze a local policy issue—say, gentrification in their neighborhood. After a presentation or reading what they wrote, ask follow-up “what if” questions that test their understanding of policy changes.

While I am trying to be broad and general, my views are shaped by what I teach—research methods for MPA students. Please add your ideas and experiences in the comments.

I see several threads running through these and related approaches:

Audio or video: Disrupts easy cut-and-paste use of AI—students must at least speak the words AI might have provided.

Real-time: Limits the time available to consult AI or create and revise tailored prompts.

Individualized: Requires students to answer questions unknown to them and only for them.

Applied to individual circumstances: Requires students to apply skills to their own situations and produce answers not already available.

Interactive: Allows follow-up twists and probes that make questions less predictable and more individual.

Are they practical?

These approaches are generally more substantive, relevant, and demanding than much of what’s now common. Think about multiple choice and short answer questions on tests or even the traditional out-of-class essay. Because these methods are more demanding, students may struggle and push back, especially given how little time they have or expect to spend.

The time limits that are inherent in these approaches, however, push in the opposite direction. Major analysis and research papers unfold over time, with multiple revisions and feedback. We can still assign such projects and require students to present and answer probing questions, but once they get those questions, they can feed them to AI for the next revision.

Many of these approaches require real-time interactions. Adding synchronous requirements to asynchronous courses undermines some of the reasons students choose them. At most, such requirements can be added only infrequently and with very flexible scheduling.

Some of these strategies will not survive further advances in AI. It can already mimic a person’s voice. It will be able to handle increasingly tailored questions, as its data includes situations increasingly like a student’s own.

But the real killer is how much faculty time all this takes.

Figuring out the right approach takes a lot of time. There are many approaches to explore because what works will vary by course and by skill within a course. Having a student perform and explain data analysis in real time can ensure they can carry out, explain, and interpret their own basic analysis. But it won’t show whether they can interpret and critically assess more involved analyses conducted by others.

Each implementation takes a lot of time. Watching data-analysis videos takes far longer than quickly reviewing written results and interpretations. Being fair also requires time. To fairly assess students in switching debate teams, for example, you must watch—and often rewatch—the videos. Compare that with reading a stack of essays.

And there is a tradeoff between time spent and how well skills and understanding are measured. Ask fewer questions and the skills measured are less comprehensive—a less valid measure. Ask fewer questions and you are more likely to hit on a random strength or weakness of students—a less reliable measure.

Less knowledgeable people cannot conduct interactive approaches. The examiners need to develop, on their feet, probing questions and twists specific to examples.

It all just takes a ton of time. The majority of faculty have nothing like that time.

Faculty at community colleges typically teach eight or more courses per year, and those classes may have many students. The media’s focus on elite higher education obscures these realities.

Faculty at four-year colleges, and especially more elite ones, teach much less. But faculty at those colleges are hired, promoted, and rewarded, not for teaching, but for research. Even when teaching is assessed and considered, it is usually through student evaluations and therefore what pleases students. Students may like more relevant assessments, but may not like, or have time for, added work. Putting a lot of time into valid assessments will likely damage a professor’s career.

I have tenure and teach relatively small classes. I can choose to spend this time, though at big cost to everything else I want to do. And after discovering that that AI can hack my oral exams, I am a bit gun shy.

Some believe that AI can solve the faculty time problem. One start-up says it can “scale oral assessment” using the professor’s own materials, image, and voice. Ok while it lasts. But if AI can grade it, then AI can hack it.

There is one silver lining: AI and other technology can free up time for assessments by reducing the time needed for teaching. Once videos and online activities are created—a very time-consuming task—they can be reused, and good videos can be shared widely. AI can also provide individualized tutoring. One student took my practice problems and sample solutions and used AI to generate effective tutoring tailored to their needs.

I believe that teachers still need to interact with students, but those can be mostly tailored interactions. All the time we spent on the same old, same old content is obsolete.

What should we faculty and our institutions do now?

Those of us who are able should work on these more innovative, time-consuming approaches. Since it generally brings no career rewards, higher ed collectively and individual institutions should support and reward it. It’s in the public interest. And I predict that in the long run, it will also pay off privately by enhancing reputation.

Still, even with a concerted effort, I’m not optimistic that viable non-in-person grading strategies exist for many skills and subjects. That’s certainly true for mass education in its current form and scale. As with in-person exams for fully online programs, the time involved will make these approaches infrequent and therefore high-stakes.

These forces will disrupt the structure and scale of higher education. Some programs may fragment; others may die. For some degrees and subjects, new independent assessment-only entities may emerge. For others, grading may disappear, with institutions focusing solely on teaching those who want to learn. I have no answers, of course, but we need to face these possibilities squarely and plan how we might adapt.

2 thoughts on “Creative, AI-proof, out-of-class assessments are possible—but take a lot of work”

  1. Thank you, Dahlia. I really could have written much of what you say here. I’m faculty in a state school with a 5/4 teaching load (and do 2 summer courses as well) – with class sizes in the hundreds now. Most years I teach in the neighborhood of 600+ students. I have loved my teaching career but, in the last year, have found myself wondering how I’ll make it through the next 5 years until retirement. I never wanted to be a cop (investigating and patrolling AI and cheating), nor a secretary (the amount of admin work we have to do increases every semester. Sorry for the rant… but your piece really hit home. Thank you.

    1. Tessa,

      The media is failing to describe your experiences, which are like those of most faculty in higher ed. The media are really missing this focusing on elite colleges and faculty teaching few classes, with either few students or TAs.

      Thanks for commenting,
      Dahlia

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top