UR 2026 · Ubiquitous Robots · Osaka, Japan

Robust Assistive Mobile Manipulation via Structured LLM Programs, Confirmation Loops, and Hierarchical Skill Recovery

Trung Bui et al.
Korea Electronics Technology Institute (KETI)

July 2026 · Ritsumeikan University, Osaka

care robot

Why an assistive mobile manipulator?

  • People with disabilities need help with everyday object tasks — fetch, deliver, tidy
  • Homes are cluttered, dynamic, unstructured — fixed scripts break
  • Language is the natural interface: “bring me the coffee can on the table”
LLMs plan well — but a wrong step on a real robot is expensive. Robustness must come from the system, not the model alone.
cluttered lab scene from robot camera

Three mechanisms for robustness

  • 1 · Structured LLM programs — the planner emits typed, schema-constrained skill calls, gated by a zero-token symbolic check before execution
  • 2 · Confirmation loops — every executed step is verified in layers (skill result → symbolic state → VLM), and the operator can inspect & edit the robot’s belief live
  • 3 · Hierarchical skill recovery — failures are repaired at the cheapest possible level: in-skill correction → suffix-only replan → bounded full replan
Together: language flexibility + execution safety + fault tolerance — on one integrated robot.

System at a glance

system overview
  • robot_agent — robot-agnostic runtime: closed planning loop, world state, verifier, FastAPI + WebSocket
  • SAGE planner (pyplanner) + a single open-weight LLM (on-premise, Ollama)
  • kcare_robot — 23 skills over ROS2 · VisionServe — off-board GPU perception
  • Web dashboard — optional multi-user supervision layer

1 · Structured LLM programs

  • Task → sub-goalstyped steps; each step = one skill affordance + arguments
  • A symbolic gate simulates preconditions & effects — 0 LLM tokens
  • Invalid step → typed feedback to the model, fixed before the robot moves
The plan is a checkable program, not a story.
task: "bring me the coffee can"
 sub-goal 1: locate the object
   Find(object="coffee can")
 sub-goal 2: fetch and deliver
   MoveTo(place="table")
   Pick(object="coffee can")
   MoveTo(place="user")
   Place(object="coffee can")

gate: Pick rejected — object not
found yet → reordered before exec

2 · Confirmation loops — verify every step

closed-loop cognition stack
  • Closed loop: perceive → plan → map → act → verify → repair
  • Layered verifier: skill resultsymbolic stateVLM check — cheapest first
  • Persistent world state (arrived · found · holding + grasp memory) survives across runs
  • Operator sees every event live and can edit the belief mid-run from the dashboard

3 · Hierarchical skill recovery

  • Level 1 · inside the skill — wrist-camera fine_move self-corrects the grasp approach
  • Level 2 · suffix-only repair — on a failed step, regenerate only the remaining steps of the failed sub-goal; the completed prefix is kept
  • Level 3 · bounded replanning — at most 3 replans per task; failures propagate cleanly, never loop forever
Recover at the cheapest level: 2.4–3.3× fewer LLM calls than whole-plan replanning.

Skills, vision & hardware

embodiment stack

23 stateless skills → pyconnect agents → ROS2 actuators · open-vocabulary perception (GroundingDINO · GroundedSAM · grasp detection) on an off-board GPU server

Real run · “pick the phone”

pick phone strip

(a) head-camera detection (ph 0.88, LYING) · (b) segmentation + grasp candidate (q0.97) · (c) wrist alignment (w72 mm, +2°) · (d) grasp verified (near=100%)

Real run · “pick the coffee can”

pick coffee strip

Same detect → segment → align → verify sequence on a standing can (co 0.80 · q0.96 · w68 mm · near=98%)

One platform, three arm-mount modes

three arm-mount modes

KAAIR 6-DOF arm on a vertical lift rail · two-finger + suction gripper · pan-tilt head RGB-D + wrist RGB-D · mobile base (Nav2)

Inside the planner (SAGE)

  • Hierarchical decomposition — task → sub-goals → steps (one LLM call each)
  • Hybrid memory — retrieves few-shot examples from a curated seed set plus its own successful episodes
  • Symbolic gate + suffix repair — the two mechanisms from pillars 1 & 3
  • Single open-weight model — the whole stack runs on-premise
2.4–3.3× fewer LLM calls to recover from a failure vs. whole-plan replanning
cognition stack with SAGE panel

Three ways to drive the same robot

ModeEntry pointUse case
Web dashboardHTTP / WebSocketmulti-user supervision, live world-state editing
CLIkcare_robot skill::inputsoperators, scripting, tests
Python APIkcare_robot.skills.*researchers, new behaviors
  • All three reach the same skill registry — identical behavior
  • New robots inherit the whole stack from a project template; per-site config profiles hot-switch deployments

Takeaways

  • Structured LLM programs + a zero-token symbolic gate make plans checkable before the robot moves
  • Confirmation loops — layered verification + live human oversight — catch failures as they happen
  • Hierarchical recovery repairs at the cheapest level: 2.4–3.3× fewer LLM calls
  • Demonstrated on a real assistive mobile manipulator, fully on-premise
Thank you! · slides: ur2026.aistations.org · docs & videos: keti-ai.github.io/carerobotdocs