Keyboard-interactive authentication prompts from server: End of keyboard-interactive prompts from server # cosma-qc — Context de reprise ## TL;DR Pipeline QC plongées COSMA : 1. ffmpeg extract frames (fps=2, 518x294) 2. auto-trim head+tail hors-eau (detection R` direct (pas de popup) ## Infra - **core (.82)** VM 101 Proxmox : FastAPI docker `cosma-qc` port 3849, SQLite `/home/floppyrj45/cosma-qc-data/jobs.db`, dispatcher Python systemd `cosma-qc-dispatcher.service` - **gpu (.87)** VM 105 Proxmox : RTX 3060 12GB, user floppyrj45, `~/ai-video/lingbot-map` - **ml-stack (.84)** host physique : RTX 3090 24GB, user root, `/root/ai-video/lingbot-map` - **z620 (.168)** Proxmox hyperviseur : SSD thin pool à surveiller (a été 100% ? VMs paused) - Source vidéos : z620:/mnt/portablessd/COSMA - La ctiotat 8 avril/raw_data/medias/videos/ ## Credentials - ssh floppyrj45@192.168.0.82 password SuperTeam2026! - ssh gpu (alias ? floppyrj45@.87) + ssh ml-stack (alias ? root@.84), clés sur core - ssh root@192.168.0.168 password SuperTeam2026! - Gitea http://192.168.0.82:3000/floppyrj45/cosma-qc (token dans ~/cosma-qc-data/dispatcher.env) ## Service dispatcher - systemd : `sudo systemctl {status,restart,stop} cosma-qc-dispatcher` - Env : `/home/floppyrj45/cosma-qc-data/dispatcher.env` (DB, WORKERS JSON, FPS) - Log : `/home/floppyrj45/cosma-qc-data/dispatcher.log` (append) + `journalctl -u cosma-qc-dispatcher` - Code : `/home/floppyrj45/docker/cosma-qc/scripts/dispatcher.py` (user floppyrj45) - Backend FastAPI container : `docker restart cosma-qc` après edit `app/` ## Patches deja appliques - rm src_*.MP4 apres extract (thin pool LVM Proxmox tight) - fstrim / - stride adaptatif selon RAM worker (62/23 GB) - estimate_vram_mib = 6000 MiB fixe - budget RAM 0.35 (etait 0.55 trop optimiste ? OOM) - auto-trim frames hors-eau (prefix + suffix) - skip si <8 min video total (COSMA_QC_MIN_VIDEO_S=480) - window_size adaptatif : 16/32/64 selon eff frames - keep demo.py alive apres PLY saved ? viser natif persistant - load balance pick_worker : lower-load first (sinon tout sur .84) - set_status auto-clear error au status change - skipped status propage dans stitch per_auv ## TODO / issues connues - [ ] Frame preview thumbnail : scp frame (fait), endpoint `/jobs/{id}/thumbnail` (fait). Backfill jobs anciens : manuel. - [ ] CSS dashboard : layout à chier selon user. Passé a été cleane mais re-tester - [ ] GLB export (lingbot_map/vis/glb_export.py existe, pas expose dans demo.py — patch demo.py pour ajouter `--save_glb`) - [ ] Cross-AUV stitch final — code existe (stitch.py) mais pas encore testé - [ ] segment_label trompeur (timestamp 1er MP4, pas durée totale). Fix dans ingest.py - [ ] Dashboard header: dispatcher heartbeat last seen Xs ago peut afficher > 5s après restart si fichier heartbeat pas reset ## Etat courant jobs (snapshot) Jobs 9, 12, 13, 16, 19 done (anciens). 10 skipped (pull marron sur pont). 11/14 en cours extract. 15/17-21 queued. Per-AUV stitch AUV209 nécessite jobs 11/12/13/14/15 done (10 exclu). AUV210 : 16/17/18/19/20/21. Cross-AUV final = stitch id 6 (prédit) sur .84 port 9206. ## Workflow attendu (user) 1. ffmpeg 2fps resize 518x294 + auto-trim hors-eau ? n frames 2. Dispatch multi-GPU balance (.84 + .87) 3. demo.py windowed 64 sur long (>3000 eff) 4. Save PLY (+ GLB TODO) 5. Viser natif via PointCloudViewer live 6. Stitch per-AUV puis cross-AUV = puzzle géant 7. Dashboard refresh avec thumbnail preview