AI Co-Drivers Shouldn’t Drive You Off the Track: A New Filter Makes Sure They Don’t

By Frederick d’Oleire Uquillas, Science Communications Fellow for the AI Lab

Ever had a driver-assist system swoop in at the last possible instant? Effective, sure, but not exactly confidence-inspiring. In high-speed settings, the first challenge is reliably preventing accidents in real time. The second challenge is to do that without spooking the driver.

A new study from Princeton University, under the supervision of Jaime Fernández Fisac, assistant professor of electrical and computer engineering, and the Toyota Research Institute tackles both: Safety with Agency: Human-Centered Safety Filter with Application to AI-Assisted Motorsports (Oh, Lidard, Hu, Sinhmar, Lazarski, Gopinath, Sumner, DeCastro, Rosman, Leonard, Fernández Fisac; 2025). Their system, called the Human-Centered Safety Filter (HCSF), isn’t a driver-assist “mode.” It is a real-time safety filter that sits between human inputs and the car, quietly ensuring safety while preserving the driver’s freedom to push the limits.

The Problem: Safety That Can Feel Like a Takeover

Many safety filters let the human drive – steer, brake, accelerate – until a predefined safety boundary is about to be breached (e.g., too-late braking into a sharp corner, steering vectors that point you past track limits). Cross that boundary and the system executes a hard override to avert the failure. These “last-resort safety filters” (LRSFs) prevent disasters, but because intervention happens at the brink, the correction can feel abrupt and disempowering.

The team asked: Can a real-time safety filter be both reliably effective and gentle enough to avoid “automation surprise”?

Meet the Human-Centered Safety Filter (HCSF)

HCSF is a safety filter for shared control: It monitors the state of the vehicle and the human’s intended action, then outputs the closest safe action, nudging early and lightly, and only scaling up as truly necessary.

The filter learns a state-action safety value, Q(s,a), also called the safety critic. Intuitively, Q(s,a) scores how safe it is to take action a in state s, measured by how far you can stay from future failure (e.g., off-track or collision).

From the Q function, they also define the state safety value, V(s) = maxa Q(s,a): the safest attainable future margin if one chooses the best action now. The key insight: The learned Q(s,a) is a valid state-action control barrier function (a Q-CBF). That lets the filter compare candidate human and nearby action directly by their future safety margins, with no analytical dynamics model required.

At each moment, HCSF solves a small problem: Choose the action closest to the human’s input that also keeps safety from degrading too fast. In other words, act early to stave off danger, and use small, proactive corrections that become firmer only as one approaches the “no-return” boundary, letting you skim the edge, but never cross it.

The team trains the safety critic with model-free reinforcement learning inside a high-fidelity racing simulator known as Assetto Corsa (shown below in an excerpt from their Figure 1). “Model-free” here means no explicit analytical vehicle model is required; the learning uses the simulator as a black box to experience cause and effect.

Why Racing, and Why a Simulator?

Because racing is a stress test for shared autonomy: split-second decisions, high speeds, and tight safety margins. Assetto Corsa is a high-fidelity simulator widely used by serious drivers. It prioritizes realism over “gamey” fun, making it a rigorous platform for both training the safety critic and testing real-time interventions without risking carbon fiber – or collarbones. Figure 2 of their paper, shown below, displays what this looks like in action.

What Makes it Work?

The team ran an in-person study with 83 participants in the simulator, comparing three between-subjects conditions:

  1. No safety filter (control/placebo)
  2. LRSF (last-resort safety filter)
  3. HCSF (the proposed human-centered filter)

They analyzed both trajectory data (e.g., off-track incidents, collisions, failures, smoothness/jerk) and participant ratings (e.g., confidence in safety, sense of control/agency, comfort/smoothness, overall satisfaction).

4 Key Takeaways

  1. Safety: HCSF maintained near-zero failures, substantially improving safety relative to no filter and at least matching LRSF on robustness.
  2. Agency: Participants driving with HCSF reported a stronger sense of control than participants driving with LRSF, consistent with smaller, earlier corrections rather than last-instant overrides.
  3. Comfort: HCSF produced lower jerk and smoother input trajectories than LRSF, aligning with the goal of proactive nudging over emergency intervention.
  4. Satisfaction: Reported satisfaction was higher with HCSF than LRSF, with qualitative metrics (trust, predictability, competence) trending in HCSF’s favor.

Why This Matters

Any domain where humans and autonomous systems share control – think surgical robots, drones, or even household devices – faces the same dual mandate: stay safe and keep humans in the loop.

HCSF shows a pragmatic path: Learn a safety-aware value function, enforce it as a Q-CBF, and minimize deviation from human intent while never compromising the safe set.

Put differently: Protect without patronizing.

Making AI Co-Pilots We Actually Want: The Final Lap

The Human-Centered Safety Filter reframes the co-pilot as a principled, real-time safety filter. One that reliably prevents failure and respects the human’s hands on the wheel. It’s a step toward AI that knows when to help and when to let you handle the car. Pushing hard, staying safe and keeping the experience yours.

Source: Ideogram

Curious to learn more? You can read the full paper on arXiv.

Leave a Reply

Your email address will not be published. Required fields are marked *


Discover more from Princeton Laboratory for Artificial Intelligence Research Blog

Subscribe to get the latest posts sent to your email.