initial prototype: aiohttp + WebSocket + viser live reconstruction
This commit is contained in:
66
README.md
Normal file
66
README.md
Normal file
@@ -0,0 +1,66 @@
|
||||
# live-reconstruction
|
||||
|
||||
Live 3D reconstruction from a mobile phone camera using
|
||||
[lingbot-map](https://github.com/Robbyant/lingbot-map) in streaming mode.
|
||||
|
||||
```
|
||||
Mobile browser ── getUserMedia ──> JPEG frames
|
||||
── WebSocket ──────> aiohttp server (Python, this repo)
|
||||
├── lingbot-map streaming inference (KV cache, bfloat16)
|
||||
└── viser scene update (rolling window of point clouds)
|
||||
|
||||
Desktop/tablet ───────────────> viser page (http://host:8081) = interactive 3D viewer
|
||||
```
|
||||
|
||||
## Setup
|
||||
|
||||
1. Checkout this repo next to a working `lingbot-map` checkout:
|
||||
|
||||
```
|
||||
~/ai-video/lingbot-map/ # clone of Robbyant/lingbot-map (with venv + pip install -e .)
|
||||
~/ai-video/live-reconstruction/ # this repo
|
||||
```
|
||||
|
||||
2. Download the model weights (once):
|
||||
|
||||
```bash
|
||||
cd ~/ai-video/lingbot-map
|
||||
.venv/bin/python -c "from huggingface_hub import snapshot_download; \
|
||||
snapshot_download('robbyant/lingbot-map', local_dir='./checkpoints/lingbot-map')"
|
||||
```
|
||||
|
||||
3. Install extra deps in the lingbot-map venv:
|
||||
|
||||
```bash
|
||||
~/ai-video/lingbot-map/.venv/bin/pip install aiohttp pillow
|
||||
```
|
||||
|
||||
## Run
|
||||
|
||||
```bash
|
||||
cd ~/ai-video/live-reconstruction
|
||||
~/ai-video/lingbot-map/.venv/bin/python server_live.py \
|
||||
--model_path ~/ai-video/lingbot-map/checkpoints/lingbot-map/lingbot-map.pt
|
||||
```
|
||||
|
||||
Then:
|
||||
|
||||
- Open `http://<host>:8080/` on your phone → tap **Start camera**.
|
||||
- Open `http://<host>:8081/` on a desktop browser → interactive viser 3D viewer.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Needs a CUDA GPU; tested on RTX 3060 12 GB.
|
||||
- Peak VRAM ~10 GB with bfloat16 + SDPA fallback (FlashInfer not installed).
|
||||
- Throughput on 3060: ~2 frames/s. The mobile page throttles to 2 FPS by default.
|
||||
- `getUserMedia` requires HTTPS on WAN — LAN / VPN exposure only for now.
|
||||
- Free up GPU memory before launching: stop `ollama`, ComfyUI, fish-speech etc.
|
||||
|
||||
## Env
|
||||
|
||||
- `LINGBOT_MAP_DIR` — override path to the upstream lingbot-map checkout
|
||||
(default: `../lingbot-map`).
|
||||
|
||||
## License
|
||||
|
||||
Code in this repo: MIT. Upstream model code: see Robbyant/lingbot-map.
|
||||
Reference in New Issue
Block a user