Mobile browser ── getUserMedia ──> JPEG frames
              ── WebSocket ──────> aiohttp server (Python, this repo)
                                   ├── lingbot-map streaming inference (KV cache, bfloat16)
                                   └── viser scene update (rolling window of point clouds)

Desktop/tablet ───────────────> viser page (http://host:8081) = interactive 3D viewer

Setup

Checkout this repo next to a working lingbot-map checkout:

~/ai-video/lingbot-map/         # clone of Robbyant/lingbot-map (with venv + pip install -e .)
~/ai-video/live-reconstruction/ # this repo

Download the model weights (once):

cd ~/ai-video/lingbot-map
.venv/bin/python -c "from huggingface_hub import snapshot_download; \
    snapshot_download('robbyant/lingbot-map', local_dir='./checkpoints/lingbot-map')"

Install extra deps in the lingbot-map venv:

~/ai-video/lingbot-map/.venv/bin/pip install aiohttp pillow

Run

cd ~/ai-video/live-reconstruction
~/ai-video/lingbot-map/.venv/bin/python server_live.py \
    --model_path ~/ai-video/lingbot-map/checkpoints/lingbot-map/lingbot-map.pt

Then:

Open http://<host>:8080/ on your phone → tap Start camera.
Open http://<host>:8081/ on a desktop browser → interactive viser 3D viewer.

Constraints

Needs a CUDA GPU; tested on RTX 3060 12 GB.
Peak VRAM ~10 GB with bfloat16 + SDPA fallback (FlashInfer not installed).
Throughput on 3060: ~2 frames/s. The mobile page throttles to 2 FPS by default.
getUserMedia requires HTTPS on WAN — LAN / VPN exposure only for now.
Free up GPU memory before launching: stop ollama, ComfyUI, fish-speech etc.

Env

LINGBOT_MAP_DIR — override path to the upstream lingbot-map checkout (default: ../lingbot-map).

License

Code in this repo: MIT. Upstream model code: see Robbyant/lingbot-map.