live-reconstruction
Live 3D reconstruction from a mobile phone camera using lingbot-map in streaming mode.
Mobile browser ── getUserMedia ──> JPEG frames
── WebSocket ──────> aiohttp server (Python, this repo)
├── lingbot-map streaming inference (KV cache, bfloat16)
└── viser scene update (rolling window of point clouds)
Desktop/tablet ───────────────> viser page (http://host:8081) = interactive 3D viewer
Setup
-
Checkout this repo next to a working
lingbot-mapcheckout:~/ai-video/lingbot-map/ # clone of Robbyant/lingbot-map (with venv + pip install -e .) ~/ai-video/live-reconstruction/ # this repo -
Download the model weights (once):
cd ~/ai-video/lingbot-map .venv/bin/python -c "from huggingface_hub import snapshot_download; \ snapshot_download('robbyant/lingbot-map', local_dir='./checkpoints/lingbot-map')" -
Install extra deps in the lingbot-map venv:
~/ai-video/lingbot-map/.venv/bin/pip install aiohttp pillow
Run
cd ~/ai-video/live-reconstruction
~/ai-video/lingbot-map/.venv/bin/python server_live.py \
--model_path ~/ai-video/lingbot-map/checkpoints/lingbot-map/lingbot-map.pt
Then:
- Open
http://<host>:8080/on your phone → tap Start camera. - Open
http://<host>:8081/on a desktop browser → interactive viser 3D viewer.
Constraints
- Needs a CUDA GPU; tested on RTX 3060 12 GB.
- Peak VRAM ~10 GB with bfloat16 + SDPA fallback (FlashInfer not installed).
- Throughput on 3060: ~2 frames/s. The mobile page throttles to 2 FPS by default.
getUserMediarequires HTTPS on WAN — LAN / VPN exposure only for now.- Free up GPU memory before launching: stop
ollama, ComfyUI, fish-speech etc.
Env
LINGBOT_MAP_DIR— override path to the upstream lingbot-map checkout (default:../lingbot-map).
License
Code in this repo: MIT. Upstream model code: see Robbyant/lingbot-map.
Languages
Python
65%
HTML
35%