All writing

A chronological archive of research notes, demo write-ups, and blog essays—kept readable for people, and legible to search and sharing systems alike.

Entries in the archive 3

The archive updates from source data and unfolds into both list pages and standalone detail pages.

Independent URLs Standalone entry pages

Every entry keeps its own stable address, ready to be revisited, linked, and discussed.

2026-03-20

Research entry

Can Verifiable Rewards Replace Constrained Decoding? Not Yet in This a2ui Run

Rio AI Research Lab

We tested whether verifier-shaped training and one-step self-repair could narrow the gap to reliable a2ui structured outputs without dedicated constrained decoding. The best executable system improved from a 21.0% best pure-prompt baseline to 40.0% VRS@0.90. That is a real gain, but it still does not support replacing constrained decoding when reliability truly matters.

Open entry

2026-03-17

Research entry

Skill Memory vs. Weight Updates: A Small Win for Memory, a Bigger Bottleneck Underneath

Rio AI Research Lab

We compared three ways to help a tool-using AI agent improve over time: saving reusable skills, applying MinT-backed weight updates, or doing both. In this run, the simple memory-only path did best at 65.0% final success versus a 62.5% frozen baseline, but every method hit the same deeper bottleneck: a ticket-update schema the agent family never truly learned.

agent-learning continual-learning tool-use mint skill-memory benchmark

Open entry

2026-03-14

Research entry

OpenClaw Security, Round 1: Mitigations Helped, but Prompt-to-Exec Risk Remained

Rio AI Research Lab

In a controlled OpenClaw security study, a small mitigation bundle cut harmful task success from 87.5% to 12.5%. The catch: prompt-to-exec attacks still succeeded in 3 of 6 mitigated runs, so the story is real progress—not a clean all-clear.

agent-security openclaw benchmark prompt-injection sandbox trust-boundary

Open entry