Files
live-reconstruction/README.md
2026-04-20 14:03:20 +00:00

2.1 KiB

live-reconstruction

Live 3D reconstruction from a mobile phone camera using lingbot-map in streaming mode.

Mobile browser ── getUserMedia ──> JPEG frames
              ── WebSocket ──────> aiohttp server (Python, this repo)
                                   ├── lingbot-map streaming inference (KV cache, bfloat16)
                                   └── viser scene update (rolling window of point clouds)

Desktop/tablet ───────────────> viser page (http://host:8081) = interactive 3D viewer

Setup

  1. Checkout this repo next to a working lingbot-map checkout:

    ~/ai-video/lingbot-map/         # clone of Robbyant/lingbot-map (with venv + pip install -e .)
    ~/ai-video/live-reconstruction/ # this repo
    
  2. Download the model weights (once):

    cd ~/ai-video/lingbot-map
    .venv/bin/python -c "from huggingface_hub import snapshot_download; \
        snapshot_download('robbyant/lingbot-map', local_dir='./checkpoints/lingbot-map')"
    
  3. Install extra deps in the lingbot-map venv:

    ~/ai-video/lingbot-map/.venv/bin/pip install aiohttp pillow
    

Run

cd ~/ai-video/live-reconstruction
~/ai-video/lingbot-map/.venv/bin/python server_live.py \
    --model_path ~/ai-video/lingbot-map/checkpoints/lingbot-map/lingbot-map.pt

Then:

  • Open http://<host>:8080/ on your phone → tap Start camera.
  • Open http://<host>:8081/ on a desktop browser → interactive viser 3D viewer.

Constraints

  • Needs a CUDA GPU; tested on RTX 3060 12 GB.
  • Peak VRAM ~10 GB with bfloat16 + SDPA fallback (FlashInfer not installed).
  • Throughput on 3060: ~2 frames/s. The mobile page throttles to 2 FPS by default.
  • getUserMedia requires HTTPS on WAN — LAN / VPN exposure only for now.
  • Free up GPU memory before launching: stop ollama, ComfyUI, fish-speech etc.

Env

  • LINGBOT_MAP_DIR — override path to the upstream lingbot-map checkout (default: ../lingbot-map).

License

Code in this repo: MIT. Upstream model code: see Robbyant/lingbot-map.