live-reconstruction/README.md

# live-reconstruction

Live 3D reconstruction from a mobile phone camera using
[lingbot-map](https://github.com/Robbyant/lingbot-map) in streaming mode.

```
Mobile browser ── getUserMedia ──> JPEG frames
              ── WebSocket ──────> aiohttp server (Python, this repo)
                                   ├── lingbot-map streaming inference (KV cache, bfloat16)
                                   └── viser scene update (rolling window of point clouds)

Desktop/tablet ───────────────> viser page (http://host:8081) = interactive 3D viewer
```

## Setup

1. Checkout this repo next to a working `lingbot-map` checkout:

   ```
   ~/ai-video/lingbot-map/         # clone of Robbyant/lingbot-map (with venv + pip install -e .)
   ~/ai-video/live-reconstruction/ # this repo
   ```

2. Download the model weights (once):

   ```bash
   cd ~/ai-video/lingbot-map
   .venv/bin/python -c "from huggingface_hub import snapshot_download; \
       snapshot_download('robbyant/lingbot-map', local_dir='./checkpoints/lingbot-map')"
   ```

3. Install extra deps in the lingbot-map venv:

   ```bash
   ~/ai-video/lingbot-map/.venv/bin/pip install aiohttp pillow
   ```

## Run

```bash
cd ~/ai-video/live-reconstruction
~/ai-video/lingbot-map/.venv/bin/python server_live.py \
    --model_path ~/ai-video/lingbot-map/checkpoints/lingbot-map/lingbot-map.pt
```

Then:

- Open `http://<host>:8080/` on your phone → tap **Start camera**.
- Open `http://<host>:8081/` on a desktop browser → interactive viser 3D viewer.

## Constraints

- Needs a CUDA GPU; tested on RTX 3060 12 GB.
- Peak VRAM ~10 GB with bfloat16 + SDPA fallback (FlashInfer not installed).
- Throughput on 3060: ~2 frames/s. The mobile page throttles to 2 FPS by default.
- `getUserMedia` requires HTTPS on WAN — LAN / VPN exposure only for now.
- Free up GPU memory before launching: stop `ollama`, ComfyUI, fish-speech etc.

## Env

- `LINGBOT_MAP_DIR` — override path to the upstream lingbot-map checkout
  (default: `../lingbot-map`).

## License

Code in this repo: MIT. Upstream model code: see Robbyant/lingbot-map.