# Stage 0 — DAgger collection (limb) Collect rollouts where a served pi0.5 (Stage 2) or pi0.6 (Stage 3 / 6) policy drives the YAM autonomously and the operator intervenes via bilateral teleop. Every frame is timestamped and tagged with the current phase (`autonomous` / `paused` / `correcting`) and per-episode success/failure. ## Prerequisites - A served policy on `0.0.0.0:8111` (see [Stage 2](stage2_sft.md) / [Stage 6](stage6_recap.md) for serving commands). - All four YAM arms initialized cleanly (per the [setup diagnostics](setup.md#verify-everything-is-wired)). - The iKKEGOL pedal connected and `pgrep -fa multiprocessing` showing no leftover orphans from a prior crashed run (CAN devices can't be shared across processes). ## The command ```bash cd ~/limb source .venv/bin/activate uv run limb record --config-path \ configs/yam_dagger_pi0_bimanual.yaml \ configs/dagger_collection.yaml ``` Two YAML overlays: - `configs/yam_dagger_pi0_bimanual.yaml` — DAgger agent (phase machine wrapping the pi0.x policy, bilateral teleop in CORRECTING) + 4-arm robot config + 3 RealSense cameras. - `configs/dagger_collection.yaml` — Recording session (continuous per-frame recording, keyboard trigger for the episode lifecycle, `num_episodes: 100`, `episode_duration_s: 200`). ## Phase machine (foot pedal) The DAgger agent has three phases, transitioned by the **pedal**: ```text left pedal right pedal AUTONOMOUS <───────────> PAUSED <───────────> CORRECTING (policy drives) (everyone holds) (operator drives via leaders; followers track) ``` You **cannot** go AUTONOMOUS↔CORRECTING directly — always through PAUSED. This guarantees the operator is positioned before control changes hands. | Pedal | Effect (from current phase) | |-----------|-------------------------------------------------------------------| | Left | `AUTONOMOUS` ↔ `PAUSED` | | Right | `PAUSED` ↔ `CORRECTING` | The agent boots in `initial_phase: paused` so the policy doesn't start moving the moment the stack launches. ## Episode lifecycle (keyboard) A separate **keyboard** trigger drives the recording session — episode start / save / discard. This avoids contention with the pedal (the pedal trigger grabs the iKKEGOL exclusively). | Key | Effect | |-----------|---------------------------------------------------------------------------------| | `SPACE` (between episodes) | start the next episode (recording begins) | | `s` | end the current episode, save with `SUCCESS` marker | | `SPACE` (during episode) | end the current episode, save with `FAILURE` marker (policy missed) | | `d` | discard the current episode (delete from disk) | | `q` | quit the session | A typical episode: 1. (Between episodes) — stage the scene; **`SPACE`** to start recording. 2. Robot is in PAUSED → optionally **right pedal** → CORRECTING, drive followers to the start pose via the leaders, **right pedal** back to PAUSED. 3. **Left pedal** → AUTONOMOUS; policy attempts the task. 4. If it drifts: **left pedal** (PAUSED) → **right pedal** (CORRECTING), bilaterally teleop back on-task, **right pedal** (PAUSED) → **left pedal** (AUTONOMOUS). 5. **`s`** if the policy completed the task, **`SPACE`** if it failed. ## What gets recorded Per episode under `recordings/_/episode_<...>/`: | File | What it is | |-----------------------------------------|-----------------------------------------------------------| | `{arm}_actions.npz` (`pos`) | commanded action that frame (operator's during CORRECTING)| | `{arm}_states.npz` | observed state (joints + gripper) | | `{arm}_policy_actions.npz` (`pos`) | what the **policy** would have produced (shadow stream) | | `{cam}.mp4` + `{cam}_timestamps.npy` | per-camera video + timestamps | | `phase.npy` | per-tick phase string (`autonomous`/`paused`/`correcting`) | | `interventions.npy` | per-tick `intervention` bool (legacy column) | | `correction_index.npy` | per-tick id grouping consecutive CORRECTING frames | | `timestamps.npy` | per-tick control-loop timestamp | | `SUCCESS` or `FAILURE` | the marker you wrote with `s` / `SPACE` | | `metadata.json` | task instruction, arm names, cameras, etc. | `policy_action` (the policy's shadow output even during CORRECTING) is the key new stream needed for RECAP — see [Stage 1](stage1_conversion.md) for how it gets surfaced. ## Hygiene rules 1. **Keep both success and failure episodes.** RECAP's value model needs both classes to converge. ~30–60% success is a healthy band for productive correction collection. We've observed ~30% in practice on small datasets. 2. **One task instruction per session.** Pistar's value model and percentile labeling are per-task; mixing breaks the percentile. 3. **Label honestly.** The `SUCCESS`/`FAILURE` marker drives `reward`, `value_label`, and downstream advantage signs. A mislabeled episode poisons the value model directly. 4. **Don't filter "boring" autonomous successes out.** Stage 5 will classify those frames as `positive` for free — they're useful training signal. ## Scale guidance (from the pi0.6 paper) The π★₀.₆ paper, Appendix A-F, reports per-task collection counts. There is no scaling curve and no explicit "enough" criterion; what they actually did: | Task | Demos | Autonomous | **Correction episodes** | Iterations | |-------------------|-------|------------|-------------------------|------------| | Laundry (diverse) | — | 450 | **287** | one set | | Espresso / café | — | 414 | **429** | per iter | | Box assembly | 600 | — | **360** | × 2 iters | For a small-scale smoke test the [Stage 3 LoRA path](stage3_lora.md) runs on ~10 episodes; for genuine RECAP improvement at paper scale aim for **~300 correction episodes per iteration**. A practical stopping signal: when the latest batch of episodes shows **intervention rate < 10%** (the policy is succeeding on its own), stop collecting and start a new iteration. ## Reference dataset The reference dataset shipped with this site is at `datasets/vial_rollout_v1_v21/` (10 episodes, 21,286 frames @ 30 fps, 3 SUCCESS / 7 FAILURE, 32.7% intervention rate). It's enough to validate the full pipeline; it is **not** enough to produce a RECAP-improved policy. ## Next Convert the raw episodes → [Stage 1](stage1_conversion.md).