The archive updates from source data and unfolds into both list pages and standalone detail pages.
ARCHIVE
All writing
A chronological archive of research notes, demo write-ups, and blog essays—kept readable for people, and legible to search and sharing systems alike.
Every entry keeps its own stable address, ready to be revisited, linked, and discussed.
Can Verifiable Rewards Replace Constrained Decoding? Not Yet in This a2ui Run
We tested whether verifier-shaped training and one-step self-repair could narrow the gap to reliable a2ui structured outputs without dedicated constrained decoding. The best executable system improved from a 21.0% best pure-prompt baseline to 40.0% VRS@0.90. That is a real gain, but it still does not support replacing constrained decoding when reliability truly matters.
Skill Memory vs. Weight Updates: A Small Win for Memory, a Bigger Bottleneck Underneath
We compared three ways to help a tool-using AI agent improve over time: saving reusable skills, applying MinT-backed weight updates, or doing both. In this run, the simple memory-only path did best at 65.0% final success versus a 62.5% frozen baseline, but every method hit the same deeper bottleneck: a ticket-update schema the agent family never truly learned.
OpenClaw Security, Round 1: Mitigations Helped, but Prompt-to-Exec Risk Remained
In a controlled OpenClaw security study, a small mitigation bundle cut harmful task success from 87.5% to 12.5%. The catch: prompt-to-exec attacks still succeeded in 3 of 6 mitigated runs, so the story is real progress—not a clean all-clear.