RECAP on YAM

Pipeline stages

  • Overview
    • What RECAP is
    • The six stages
    • How our pipeline differs from RLinf and Evo-RL
    • Architecture at a glance
    • When to use which checkpoint
    • YAM TrainConfig reference
      • Picking one
  • Setup
    • Repositories
      • Clone
      • Patches you must apply to pistar
    • Python environments
    • VLM checkpoint (for Stage 4)
    • pi0.5 base weights (for Stage 2 SFT)
    • Hardware
    • Verify everything is wired
    • Next
  • Stage 0 — DAgger collection (limb)
    • Prerequisites
    • The command
    • Phase machine (foot pedal)
    • Episode lifecycle (keyboard)
    • What gets recorded
    • Hygiene rules
    • Scale guidance (from the pi0.6 paper)
    • Reference dataset
    • Next
  • Stage 1 — Convert to LeRobot v2.1 with five RECAP columns
    • The five RECAP columns
    • Command
      • Flags that matter for pi0.6
      • SFT demo variant
    • Verify the conversion
    • Why two converters
    • Symlink into pistar’s lerobot cache
    • Next
  • Stage 2 — Initial SFT (openpi, full fine-tune)
    • Required inputs
    • Add a YAM TrainConfig to openpi
    • Compute norm stats (~25 min, one-time)
    • Train (full fine-tune, ~3 h on 8× H100)
    • Push to HuggingFace (recommended)
    • Stage 3 / Stage 6 input — what they need from here
    • Gotchas (from openpi yam_finetune.md)
    • Next
  • Stage 3 — pi0.6 fine-tune from SFT (no VLM yet)
    • What the configs look like
    • Train
    • LoRA variant
    • What this gives you, what it doesn’t
    • Next
  • Stage 4 — VLM value model training (pistar, patched)
    • Required inputs
    • Quick smoke test (5 steps, ~30 s)
    • Real training run
    • Tuning knobs
    • What the saved checkpoint contains
    • Healthy training signals
    • Common failures
    • Next
  • Stage 5 — Advantage labeling (VLM relabel of adv_ind)
    • Make a standalone copy of the dataset
    • What it does
    • Command
      • Flag explanations
      • Tuning
    • Verify the relabel
    • Operating principle (intuition)
    • Next
  • Stage 6 — Full RECAP fine-tune
    • Inputs
    • The _recap TrainConfig
    • Command
      • How to verify at runtime that the VLM-labeled dataset is loading
      • Continue from a Stage 3 checkpoint instead of the SFT
    • What’s different from Stage 3 (no code, only data)
    • Healthy signs at Stage 6
    • Multi-iteration loop (paper-scale)
    • What goes wrong, and what to do
    • Next
  • Evaluation — Serve + deploy via limb
    • Serve
      • Match the model variant
    • limb side — add adv_ind: "positive" to the obs transform
    • What’s served vs what limb sends
    • Quantitative evaluation
    • Common deployment issues
    • Closed loop — feeding evaluation back into training
    • Next

Reference

  • Patches reference — making pistar Stage 4 / 5 actually run
    • Patch summary
    • Patches in detail
      • 1. ValueModelWeightLoader
      • 2. Missing gemma/gm/data/ directory
      • 3. kauldron.ktyping → kauldron.typing
      • 4. & 5. etils.edc.ContextStack in _dtype_params.py
      • 6. & 7. openpi.shared.{console,progress} stubs
      • 8. & 9. DataConfig field rename + action key
      • 10. create_value_data_loader
      • 11. DataLoaderImpl.dataset + __len__
      • 12. TrainState → flax.struct.PyTreeNode
      • 13. _ValueDataLoaderImpl, GemmaValueTokenizer.tokenize extra args, int(step) for tqdm
      • 14. & 15. Same gaps in label_advantage_from_vlm.py (Stage 5)
    • End-to-end verification
RECAP on YAM
  • Search


© Copyright 2026, limb + pistar contributors.

Built with Sphinx using a theme provided by Read the Docs.