Files
live-reconstruction/README.md
2026-04-20 14:03:20 +00:00

67 lines
2.1 KiB
Markdown

# live-reconstruction
Live 3D reconstruction from a mobile phone camera using
[lingbot-map](https://github.com/Robbyant/lingbot-map) in streaming mode.
```
Mobile browser ── getUserMedia ──> JPEG frames
── WebSocket ──────> aiohttp server (Python, this repo)
├── lingbot-map streaming inference (KV cache, bfloat16)
└── viser scene update (rolling window of point clouds)
Desktop/tablet ───────────────> viser page (http://host:8081) = interactive 3D viewer
```
## Setup
1. Checkout this repo next to a working `lingbot-map` checkout:
```
~/ai-video/lingbot-map/ # clone of Robbyant/lingbot-map (with venv + pip install -e .)
~/ai-video/live-reconstruction/ # this repo
```
2. Download the model weights (once):
```bash
cd ~/ai-video/lingbot-map
.venv/bin/python -c "from huggingface_hub import snapshot_download; \
snapshot_download('robbyant/lingbot-map', local_dir='./checkpoints/lingbot-map')"
```
3. Install extra deps in the lingbot-map venv:
```bash
~/ai-video/lingbot-map/.venv/bin/pip install aiohttp pillow
```
## Run
```bash
cd ~/ai-video/live-reconstruction
~/ai-video/lingbot-map/.venv/bin/python server_live.py \
--model_path ~/ai-video/lingbot-map/checkpoints/lingbot-map/lingbot-map.pt
```
Then:
- Open `http://<host>:8080/` on your phone → tap **Start camera**.
- Open `http://<host>:8081/` on a desktop browser → interactive viser 3D viewer.
## Constraints
- Needs a CUDA GPU; tested on RTX 3060 12 GB.
- Peak VRAM ~10 GB with bfloat16 + SDPA fallback (FlashInfer not installed).
- Throughput on 3060: ~2 frames/s. The mobile page throttles to 2 FPS by default.
- `getUserMedia` requires HTTPS on WAN — LAN / VPN exposure only for now.
- Free up GPU memory before launching: stop `ollama`, ComfyUI, fish-speech etc.
## Env
- `LINGBOT_MAP_DIR` — override path to the upstream lingbot-map checkout
(default: `../lingbot-map`).
## License
Code in this repo: MIT. Upstream model code: see Robbyant/lingbot-map.