# live-reconstruction Live 3D reconstruction from a mobile phone camera using [lingbot-map](https://github.com/Robbyant/lingbot-map) in streaming mode. ``` Mobile browser ── getUserMedia ──> JPEG frames ── WebSocket ──────> aiohttp server (Python, this repo) ├── lingbot-map streaming inference (KV cache, bfloat16) └── viser scene update (rolling window of point clouds) Desktop/tablet ───────────────> viser page (http://host:8081) = interactive 3D viewer ``` ## Setup 1. Checkout this repo next to a working `lingbot-map` checkout: ``` ~/ai-video/lingbot-map/ # clone of Robbyant/lingbot-map (with venv + pip install -e .) ~/ai-video/live-reconstruction/ # this repo ``` 2. Download the model weights (once): ```bash cd ~/ai-video/lingbot-map .venv/bin/python -c "from huggingface_hub import snapshot_download; \ snapshot_download('robbyant/lingbot-map', local_dir='./checkpoints/lingbot-map')" ``` 3. Install extra deps in the lingbot-map venv: ```bash ~/ai-video/lingbot-map/.venv/bin/pip install aiohttp pillow ``` ## Run ```bash cd ~/ai-video/live-reconstruction ~/ai-video/lingbot-map/.venv/bin/python server_live.py \ --model_path ~/ai-video/lingbot-map/checkpoints/lingbot-map/lingbot-map.pt ``` Then: - Open `http://:8080/` on your phone → tap **Start camera**. - Open `http://:8081/` on a desktop browser → interactive viser 3D viewer. ## Constraints - Needs a CUDA GPU; tested on RTX 3060 12 GB. - Peak VRAM ~10 GB with bfloat16 + SDPA fallback (FlashInfer not installed). - Throughput on 3060: ~2 frames/s. The mobile page throttles to 2 FPS by default. - `getUserMedia` requires HTTPS on WAN — LAN / VPN exposure only for now. - Free up GPU memory before launching: stop `ollama`, ComfyUI, fish-speech etc. ## Env - `LINGBOT_MAP_DIR` — override path to the upstream lingbot-map checkout (default: `../lingbot-map`). ## License Code in this repo: MIT. Upstream model code: see Robbyant/lingbot-map.