fix: 05_inference viser-kill + background-poll + offload_to_cpu from yaml #13

Merged
poulpe merged 1 commits from fix/05-inference-viser-kill-offload into feature/auto-pipeline 2026-05-14 04:47:11 +00:00
Contributor

Root cause

Demo.py starts viser server after writing PLY; SSH blocks on viser (never exits). Python timeout=7200s fires, remote demo.py orphaned. Multiple orphans compete for GPU with --offload_to_cpu → pure CPU inference, 6h+ per 500-frame segment.

Fixes

  • kill_stale_demo_py() before each segment start
  • Remote bash: nohup demo.py &, poll PLY every 30s, kill when PLY done
  • offload_to_cpu read from yaml[inference] (default false — 24GB VRAM no offload needed)
  • timeout from yaml inference_timeout_s (10800s)
  • min_frames guard merged from fix/05-inference-min-frames-timeout
## Root cause Demo.py starts viser server after writing PLY; SSH blocks on viser (never exits). Python timeout=7200s fires, remote demo.py orphaned. Multiple orphans compete for GPU with --offload_to_cpu → pure CPU inference, 6h+ per 500-frame segment. ## Fixes - kill_stale_demo_py() before each segment start - Remote bash: nohup demo.py &, poll PLY every 30s, kill when PLY done - offload_to_cpu read from yaml[inference] (default false — 24GB VRAM no offload needed) - timeout from yaml inference_timeout_s (10800s) - min_frames guard merged from fix/05-inference-min-frames-timeout
poulpe added 1 commit 2026-05-13 16:41:44 +00:00
- kill_stale_demo_py() before each segment to prevent GPU contention from orphan processes
- Remote script runs demo.py in background via nohup, polls for PLY file every 30s, kills viser server once PLY written — prevents indefinite SSH block on viser listener
- offload_to_cpu now read from thresholds.yaml[inference] (default false for 24GB VRAM)
- timeout reads inference_timeout_s from yaml (already 10800s)
- min_frames guard included (from fix/05-inference-min-frames-timeout)

Root cause: demo.py starts viser server after writing PLY; SSH timed out → orphan; two orphans competed for GPU with offload_to_cpu → pure CPU inference = 6h+ for 493 frames
poulpe merged commit 50ca77490d into feature/auto-pipeline 2026-05-14 04:47:11 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: floppyrj45/cosma-qc#13