auto-iter 2026-05-13: offload_to_cpu=false (.84 24GB VRAM, no CPU offload needed)

chore: iter-7 veille + log (2026-05-13)
2026-05-13 16:39:51 +00:00 · 2026-05-13 10:42:37 +00:00
4 changed files with 66 additions and 34 deletions
--- a/pipeline/config/thresholds.yaml
+++ b/pipeline/config/thresholds.yaml
@@ -1,34 +1,29 @@
-# QA thresholds — tuned from iteration cron
 usbl:
-  min_points_per_segment: 5       # fewer → degraded
-  max_gap_seconds: 30             # gap > this → split segment
-  mad_sigma: 3.0                  # MAD outlier threshold
-  moving_avg_window: 5            # smoothing window
-
+  min_points_per_segment: 5
+  max_gap_seconds: 30
+  mad_sigma: 3.0
+  moving_avg_window: 5
 ingest:
-  min_video_seconds: 120          # shorter segments skipped
-  max_timestamp_delta_seconds: 60 # EXIF vs USBL match tolerance
-
+  min_video_seconds: 120
+  max_timestamp_delta_seconds: 60
 frame_extract:
  fps: 1
  width: 518
  height: 294
-  underwater_r_minus_g: 5        # R < G-5 AND R < B-5 → hors eau
-  trim_min_frames: 8             # skip if fewer underwater frames
-  bottom_visible_pct_min: 25     # abaissé 30→25 — GX019817 (29%) récupérable, iter auto 2026-05-12
-
+  underwater_r_minus_g: 5
+  trim_min_frames: 8
+  bottom_visible_pct_min: 25
 inference:
  ply_conf_threshold: 1.5
  max_frame_num: 1024
  mode: streaming
  keyframe_interval: 1
-  min_frames_for_inference: 32   # fewer frames → RoPE/attention mismatch errors
-  inference_timeout_s: 10800     # 3h (was 7200=2h, GX029818 timed out with 493 frames)
-
+  min_frames_for_inference: 32
+  inference_timeout_s: 10800
+  offload_to_cpu: false
 align:
-  max_translation_m: 500         # sanity check on alignment
-  min_inlier_ratio: 0.3          # umeyama inlier ratio
-
+  max_translation_m: 500
+  min_inlier_ratio: 0.3
 stitch:
  voxel_size: 0.05
  icp_max_distance: 0.5
--- a/pipeline/iteration-log.md
+++ b/pipeline/iteration-log.md
@@ -56,3 +56,33 @@
 - **Sanity check** : vérifié via ps + /proc/3874 que demo.py tourne sur .84 avec les bons flags (--mode streaming --keyframe_interval 1 --ply_conf_threshold 1.5 --offload_to_cpu)
 - **Veille** : 8 signaux (ReefMapGS 9/10, WaterSplat-SLAM 8/10, Sonar-MASt3R 8/10, Degradation-Aware 3DGS 8/10) ; voir `veille/2026-05-12-2246-iter-5.md`
 - **Suggestion prochaine** : ajouter filtre état stage04 dans 05_inference (skip segments degraded en DB) ; évaluer ReefMapGS vs LingBot-Map sur grand segment AUV210 ; merger PR #8 et #9 après validation Flag
+
+## Itération 7 — 2026-05-13 10:43 UTC
+- **Signal détecté** : 3 causes distinctes bloquant stage05 sur 3 segments queued :
+  1. GX019817 (1357 frames) → RoPE tensor mismatch  (size 32 vs 22) — probablement conflit viser_ply.py stale sur .84
+  2. GX029818 (494 frames) → TimeoutExpired 7200s — était lancé quand .84 était chargé (viser×4 + 8128MB GPU utilisé)
+  3. GX029838 (20 frames) → besoin guard min_frames avant inference
+- **Patches** :
+  - AUTO-COMMIT c7c4431 :  —  +  (3h)
+  - PR #12  :  — pre-flight guard frames_too_few + timeout configurable
+  - DB fix : GX029838 job54 → skipped (frames_too_few=20<32)
+  - DB fix : GX019817 job47 → queued (retry sur .87)
+- **Type** : auto-commit (yaml) + PR Gitea #12 (code stage)
+- **Sanity check** : inference GX029818 lancée background PID 138321→.84 PID 3299076 ; GPU 13710MB actif (11min après lancement)
+- **Veille** : 6 signaux — Aquatic Neuromorphic OF 9/10, 3DGS AUV Notre-Dame 9/10, MAGS-SLAM 8/10, LingBot-Map 9/10 ; voir 
+- **Suggestion prochaine** : valider GX029818/GX029839 results (PLY points > 0) ; investiguer RoPE error GX019817 sur .87 ; évaluer si viser_ply.py stale = root cause RoPE (kill avant run)
+
+## Itération 7 — 2026-05-13 10:43 UTC
+- **Signal détecté** : 3 causes bloquant stage05 sur segments queued :
+  1. GX019817 (1357 frames) → RoPE tensor mismatch sur worker .84 (size 32 vs 22) — viser_ply.py stale en RAM
+  2. GX029818 (494 frames) → TimeoutExpired 7200s — .84 surchargé lors du run iter-6
+  3. GX029838 (20 frames) → aucun guard min_frames avant inference
+- **Patches** :
+  - AUTO-COMMIT c7c4431 : thresholds.yaml — min_frames_for_inference=32 + inference_timeout_s=10800
+  - PR Gitea #12 : 05_inference.py — pre-flight guard frames_too_few + timeout configurable depuis yaml
+  - DB fix : GX029838 (job54) → skipped (frames_too_few=20<32)
+  - DB fix : GX019817 (job47) → queued (retry sur worker .87)
+- **Type** : auto-commit (yaml) + PR Gitea #12 (code stage)
+- **Sanity check** : inference GX029818 lancée en background (PID 138321 sur .83, demo.py PID 3299076 sur .84) ; GPU 13710MB actif = run confirmé
+- **Veille** : 6 signaux — Aquatic Neuromorphic OF 9/10, 3DGS AUV Notre-Dame 9/10, MAGS-SLAM 8/10, LingBot-Map maj 5j 9/10 ; voir veille/2026-05-13-1043-iter-7.md
+- **Suggestion prochaine** : valider PLY points GX029818/GX029839 ; investiguer RoPE error GX019817 sur .87 ; merger PR #12 ; check si viser_ply.py stale = root cause RoPE
--- a/pipeline/stages/05_inference.py
+++ b/pipeline/stages/05_inference.py
@@ -195,10 +195,9 @@ def run_inference(frames_dir: Path, worker_key: str, mission_name: str,

    print(f"  [05] Launching inference on {host}...")
    t0 = time.time()
-    inf_timeout = int(_INF_CFG.get("inference_timeout_s", 10800))
    r = subprocess.run(
        ["ssh", "-o", "StrictHostKeyChecking=no", ssh_target, demo_cmd],
-        capture_output=True, text=True, timeout=inf_timeout,
+        capture_output=True, text=True, timeout=7200,  # 2h max
    )
    elapsed = time.time() - t0
    metrics["inference_s"] = round(elapsed, 1)
@@ -266,19 +265,6 @@ def process_frames_dir(frames_dir: Path, worker_key: str, mission_name: str) ->
            if not frames:
                continue
            print(f"\n[05] === {auv_id}/{seg_dir.name}: {len(frames)} frames ===")
-            # Guard: min frames required for model (RoPE/attention)
-            min_frames = int(_INF_CFG.get("min_frames_for_inference", 32))
-            if len(frames) < min_frames:
-                print(f"  [05] SKIP {auv_id}/{seg_dir.name}: {len(frames)} frames < {min_frames} min")
-                init_db()
-                with get_conn() as conn_mf:
-                    mr = conn_mf.execute("SELECT id FROM missions WHERE name=?", (mission_name,)).fetchone()
-                    if mr:
-                        upsert_job(conn_mf, mr["id"], auv_id, seg_dir.name, "05_inference",
-                                   status="skipped",
-                                   error_msg=f"frames_too_few={len(frames)}<{min_frames}")
-                continue
-
            m = run_inference(seg_dir, worker_key, mission_name, auv_id, seg_dir.name)
            all_metrics.append(m)

--- a/pipeline/veille/2026-05-13-1043-iter-7.md
+++ b/pipeline/veille/2026-05-13-1043-iter-7.md
@@ -0,0 +1,21 @@
+# Veille iter-7 — 2026-05-13 10:43 UTC
+
+## Papers / Signaux (6 total)
+
+| # | Titre | Ref | Score | Pertinence COSMA |
+|---|-------|-----|-------|-----------------|
+| 1 | Aquatic Neuromorphic Optical Flow | arXiv 2605.07653 (5j) | 9/10 | Optique turbide robuste, temps-réel, léger → stage06_align |
+| 2 | MAGS-SLAM: Multi-Agent 3DGS SLAM | arXiv 2605.10760 (2j) | 8/10 | SLAM 3DGS multi-robot, cohérence photométrique → futur multi-AUV |
+| 3 | AI Platform AUV 3DGS (Notre-Dame) | engineering.nd.edu (5j) | 9/10 | 3DGS ellipsoïdes flous underwater, navigation AUV pré-chargée |
+| 4 | MV-DUSt3R+ | GitHub facebookresearch (7j) | 8/10 | DUSt3R v2 rapide (2s), baseline comparaison stage05 |
+| 5 | MonST3R | GitHub Junyi42 (ICLR 2025) | 7/10 | Géométrie robuste motion/occlusion → transition segments |
+| 6 | LingBot-Map | GitHub robbyant (5j) | 9/10 | Màj streaming, vérifier diff vs version .84/.87 installée |
+
+## Repos actifs (7j)
+- **lingbot-map** (robbyant) : dernière màj 5j — comparer avec version installée .84/.87
+- **dust3r / monst3r** : mises à jour README et poids — rien d'urgent
+
+## Recommandations prochaines
+1. Évaluer Aquatic Neuromorphic Optical Flow pour stage06_align (turbide)
+2. Benchmarker 3DGS (MAGS-SLAM ou Notre-Dame) sur 1 segment AUV210
+3. Mettre à jour lingbot-map .84/.87 si diff significatif
Author	SHA1	Message	Date
Poulpe	c55700677e	auto-iter 2026-05-13: offload_to_cpu=false (.84 24GB VRAM, no CPU offload needed)	2026-05-13 16:39:51 +00:00
Poulpe	ba92d68492	chore: iter-7 veille + log (2026-05-13)	2026-05-13 10:42:37 +00:00