Compare commits

..

2 Commits

Author SHA1 Message Date
Poulpe
5ead87d59c fix(05-inference): min_frames guard + configurable timeout
- Skip segments with < min_frames_for_inference (32) frames — prevents
  RoPE/attention tensor mismatch (GX029838: 20 frames)
- Timeout now reads inference_timeout_s from thresholds.yaml (default 3h)
  GX029818 (493 frames) timed out at 7200s — raised to 10800

Authored-by: Poulpe <claude@nowyouknow.fr>
2026-05-13 10:37:04 +00:00
Poulpe
c7c4431e72 auto-iter 2026-05-13: inference min_frames=32 + timeout 3h (was 2h)
- min_frames_for_inference: 32 (RoPE/attention needs ≥32 frames)
- inference_timeout_s: 10800 (GX029818 timed out at 7200s with 493 frames)

Authored-by: Poulpe <claude@nowyouknow.fr>
2026-05-13 10:36:28 +00:00
4 changed files with 16 additions and 76 deletions

View File

@@ -22,6 +22,8 @@ inference:
max_frame_num: 1024
mode: streaming
keyframe_interval: 1
min_frames_for_inference: 32 # fewer frames → RoPE/attention mismatch errors
inference_timeout_s: 10800 # 3h (was 7200=2h, GX029818 timed out with 493 frames)
align:
max_translation_m: 500 # sanity check on alignment

View File

@@ -56,16 +56,3 @@
- **Sanity check** : vérifié via ps + /proc/3874 que demo.py tourne sur .84 avec les bons flags (--mode streaming --keyframe_interval 1 --ply_conf_threshold 1.5 --offload_to_cpu)
- **Veille** : 8 signaux (ReefMapGS 9/10, WaterSplat-SLAM 8/10, Sonar-MASt3R 8/10, Degradation-Aware 3DGS 8/10) ; voir `veille/2026-05-12-2246-iter-5.md`
- **Suggestion prochaine** : ajouter filtre état stage04 dans 05_inference (skip segments degraded en DB) ; évaluer ReefMapGS vs LingBot-Map sur grand segment AUV210 ; merger PR #8 et #9 après validation Flag
## Itération 6 — 2026-05-13 04:31 UTC
- **Signal détecté** : jamais passé à dans stage05 → 10 jobs error sans trace (debug impossible). Cause secondaire : 6 segments au stage04 envoyés en inference par iter-5.
- **Patches** :
- PR #11 : — 2 fixes dans :
1. transmis à sur failure
2. Guard stage04=degraded avant → status=skipped
- DB reset : 6 jobs error → skipped (stage04=degraded) ; 4 jobs error → queued (stage04=done)
- **Type** : PR Gitea #11 (modif code stage)
- **Sanity check** : inference re-lancée background PID 66232 sur .84 RTX3090 ; GPU 15.5G chargé (GX019817 1357 frames en cours). 4 segments queued : GX019817/GX029818/GX029838/GX029839. Résultats ~1h.
- **Veille** : 8 signaux — LingBot-Map màj 5j (vérifier diff .84/.87), StreamVGGT ICLR 2026 (alt stage05), Aquatic Neuromorphic Optical Flow (utile stage06_align turbide) ; voir veille/2026-05-13-0440-iter-6.md
- **Suggestion prochaine** : merger PR #11 → valider inference 4 segments ; màj lingbot-map sur .84/.87 ; évaluer StreamVGGT sur 1 segment benchmark

View File

@@ -195,9 +195,10 @@ def run_inference(frames_dir: Path, worker_key: str, mission_name: str,
print(f" [05] Launching inference on {host}...")
t0 = time.time()
inf_timeout = int(_INF_CFG.get("inference_timeout_s", 10800))
r = subprocess.run(
["ssh", "-o", "StrictHostKeyChecking=no", ssh_target, demo_cmd],
capture_output=True, text=True, timeout=7200, # 2h max
capture_output=True, text=True, timeout=inf_timeout,
)
elapsed = time.time() - t0
metrics["inference_s"] = round(elapsed, 1)
@@ -265,25 +266,18 @@ def process_frames_dir(frames_dir: Path, worker_key: str, mission_name: str) ->
if not frames:
continue
print(f"\n[05] === {auv_id}/{seg_dir.name}: {len(frames)} frames ===")
# Guard: skip if stage04 is degraded (no useful frames)
init_db()
with get_conn() as conn_check:
mission_row_check = conn_check.execute(
"SELECT id FROM missions WHERE name=?", (mission_name,)
).fetchone()
if mission_row_check:
s04 = conn_check.execute(
"SELECT status FROM jobs WHERE mission_id=? AND auv_id=? "
"AND segment_label=? AND stage='04_frame_extract'",
(mission_row_check["id"], auv_id, seg_dir.name),
).fetchone()
if s04 and s04["status"] == "degraded":
print(f" [05] SKIP {auv_id}/{seg_dir.name}: stage04=degraded")
upsert_job(conn_check, mission_row_check["id"], auv_id, seg_dir.name,
"05_inference", status="skipped",
error_msg="stage04=degraded, skipped")
continue
# Guard: min frames required for model (RoPE/attention)
min_frames = int(_INF_CFG.get("min_frames_for_inference", 32))
if len(frames) < min_frames:
print(f" [05] SKIP {auv_id}/{seg_dir.name}: {len(frames)} frames < {min_frames} min")
init_db()
with get_conn() as conn_mf:
mr = conn_mf.execute("SELECT id FROM missions WHERE name=?", (mission_name,)).fetchone()
if mr:
upsert_job(conn_mf, mr["id"], auv_id, seg_dir.name, "05_inference",
status="skipped",
error_msg=f"frames_too_few={len(frames)}<{min_frames}")
continue
m = run_inference(seg_dir, worker_key, mission_name, auv_id, seg_dir.name)
all_metrics.append(m)
@@ -298,7 +292,6 @@ def process_frames_dir(frames_dir: Path, worker_key: str, mission_name: str) ->
conn, mission_row["id"], auv_id, seg_dir.name, "05_inference",
status="done" if m.get("status") == "ok" else m.get("status", "error"),
output_path=m.get("ply", ""),
error_msg=m.get("error", "") if m.get("status") != "ok" else None,
)
record_metric(conn, job_id, "ply_points", value=m.get("n_points", 0),
pass_fail="pass" if m.get("n_points", 0) > 100 else "fail")

View File

@@ -1,42 +0,0 @@
# Veille iter-6 — 2026-05-13 04:40 UTC
## Signaux (seuil ≥ 6/10)
### Score 9/10
**Aquatic Neuromorphic Optical Flow** — arxiv:2605.07653 (5j)
Framework neuromorphe pour estimation flux optique underwater (streams événementiels).
→ Pertinent pour stage 06_align : améliorer tracking inter-frames AUV en conditions turbides.
**LingBot-Map** — github.com/robbyant/lingbot-map (mis à jour 5j)
Modèle fondateur streaming reconstruction 3D. Version utilisée en production ; vérifier diff.
→ ACTION: comparer version sur .84/.87 vs commit HEAD, updater si correctif inclus.
### Score 8/10
**StreamVGGT** [ICLR 2026] — github.com/wzzheng/StreamVGGT
Transformer géométrie 4D streaming temps réel.
→ Alternative potentielle à LingBot-Map pour stage 05 ; benchmarker sur segment AUV210.
**All-3R-SLAM-in-this-Repo** — github.com/3D-Vision-World
Compilation DUSt3R / MonST3R / CUT3R / LingBot-Map.
→ Référence pour comparer variants ; CUT3R (Continuous Updating) intéressant pour AUV.
**Awesome-DUSt3R** — github.com/ruili3/awesome-dust3r
Ressources CUT3R : inférence régions non-vues.
→ CUT3R à évaluer sur mission avec zones de chevauchement limité.
### Score 7/10
**AI-Aided AUV Navigation** — arxiv:2605.04672 (7j)
Fusion capteurs IA + algorithmes adaptatifs navigation AUV.
→ Potentiellement utile pour stage 06_align (USBL + IMU fusion).
### Score 6/10
**HY-World 2.0** — github.com/Tencent-Hunyuan/HY-World-2.0 (1j)
World model multi-modal 3D : point clouds, depth, normales.
→ À surveiller ; trop généraliste pour l'instant.
## Résumé
8 signaux (6 ≥ score 6). Top signal : LingBot-Map à mettre à jour sur workers + StreamVGGT à évaluer.