UR 2026 · Ubiquitous Robots · Osaka, Japan

Robust Assistive Mobile Manipulation via Structured LLM Programs, Confirmation Loops, and Hierarchical Skill Recovery

Trung Bui et al.
Korea Electronics Technology Institute (KETI)

July 2026 · Ritsumeikan University, Osaka

Why an assistive mobile manipulator?

People with disabilities need help with everyday object tasks — fetch, deliver, tidy
Homes are cluttered, dynamic, unstructured — fixed scripts break
Language is the natural interface: “bring me the coffee can on the table”

LLMs plan well — but a wrong step on a real robot is expensive. Robustness must come from the system, not the model alone.

Three mechanisms for robustness

1 · Structured LLM programs — the planner emits typed, schema-constrained skill calls, gated by a zero-token symbolic check before execution
2 · Confirmation loops — every executed step is verified in layers (skill result → symbolic state → VLM), and the operator can inspect & edit the robot’s belief live
3 · Hierarchical skill recovery — failures are repaired at the cheapest possible level: in-skill correction → suffix-only replan → bounded full replan

Together: language flexibility + execution safety + fault tolerance — on one integrated robot.

System at a glance

robot_agent — robot-agnostic runtime: closed planning loop, world state, verifier, FastAPI + WebSocket
SAGE planner (pyplanner) + a single open-weight LLM (on-premise, Ollama)
kcare_robot — 23 skills over ROS2 · VisionServe — off-board GPU perception
Web dashboard — optional multi-user supervision layer

1 · Structured LLM programs

Task → sub-goals → typed steps; each step = one skill affordance + arguments
A symbolic gate simulates preconditions & effects — 0 LLM tokens
Invalid step → typed feedback to the model, fixed before the robot moves

The plan is a checkable program, not a story.

task: "bring me the coffee can"
 sub-goal 1: locate the object
   Find(object="coffee can")
 sub-goal 2: fetch and deliver
   MoveTo(place="table")
   Pick(object="coffee can")
   MoveTo(place="user")
   Place(object="coffee can")

gate: Pick rejected — object not
found yet → reordered before exec

2 · Confirmation loops — verify every step

Closed loop: perceive → plan → map → act → verify → repair
Layered verifier: skill result → symbolic state → VLM check — cheapest first
Persistent world state (arrived · found · holding + grasp memory) survives across runs
Operator sees every event live and can edit the belief mid-run from the dashboard

3 · Hierarchical skill recovery

Level 1 · inside the skill — wrist-camera fine_move self-corrects the grasp approach
Level 2 · suffix-only repair — on a failed step, regenerate only the remaining steps of the failed sub-goal; the completed prefix is kept
Level 3 · bounded replanning — at most 3 replans per task; failures propagate cleanly, never loop forever

Recover at the cheapest level: 2.4–3.3× fewer LLM calls than whole-plan replanning.

Skills, vision & hardware

23 stateless skills → pyconnect agents → ROS2 actuators · open-vocabulary perception (GroundingDINO · GroundedSAM · grasp detection) on an off-board GPU server

Real run · “pick the phone”

(a) head-camera detection (ph 0.88, LYING) · (b) segmentation + grasp candidate (q0.97) · (c) wrist alignment (w72 mm, +2°) · (d) grasp verified (near=100%)

Real run · “pick the coffee can”

Same detect → segment → align → verify sequence on a standing can (co 0.80 · q0.96 · w68 mm · near=98%)

One platform, three arm-mount modes

KAAIR 6-DOF arm on a vertical lift rail · two-finger + suction gripper · pan-tilt head RGB-D + wrist RGB-D · mobile base (Nav2)

Inside the planner (SAGE)

Hierarchical decomposition — task → sub-goals → steps (one LLM call each)
Hybrid memory — retrieves few-shot examples from a curated seed set plus its own successful episodes
Symbolic gate + suffix repair — the two mechanisms from pillars 1 & 3
Single open-weight model — the whole stack runs on-premise

2.4–3.3× fewer LLM calls to recover from a failure vs. whole-plan replanning

Three ways to drive the same robot

Mode	Entry point	Use case
Web dashboard	HTTP / WebSocket	multi-user supervision, live world-state editing
CLI	`kcare_robot skill::inputs`	operators, scripting, tests
Python API	`kcare_robot.skills.*`	researchers, new behaviors

All three reach the same skill registry — identical behavior
New robots inherit the whole stack from a project template; per-site config profiles hot-switch deployments

Takeaways

Structured LLM programs + a zero-token symbolic gate make plans checkable before the robot moves
Confirmation loops — layered verification + live human oversight — catch failures as they happen
Hierarchical recovery repairs at the cheapest level: 2.4–3.3× fewer LLM calls
Demonstrated on a real assistive mobile manipulator, fully on-premise

Thank you! · slides: ur2026.aistations.org · docs & videos: keti-ai.github.io/carerobotdocs