LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction

--- # Quick Start ## Installation **1. Create conda environment** ```bash conda create -n lingbot-map python=3.10 -y conda activate lingbot-map ``` **2. Install PyTorch (CUDA 12.8)** ```bash pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128 ``` > For other CUDA versions, see [PyTorch Get Started](https://pytorch.org/get-started/locally/). **3. Install lingbot-map** ```bash pip install -e . ``` **4. Install FlashInfer (recommended)** FlashInfer provides paged KV cache attention for efficient streaming inference: ```bash # CUDA 12.8 + PyTorch 2.9 pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/ ``` > For other CUDA/PyTorch combinations, see [FlashInfer installation](https://docs.flashinfer.ai/installation.html). > If FlashInfer is not installed, the model falls back to SDPA (PyTorch native attention) via `--use_sdpa`. **5. Visualization dependencies (optional)** ```bash pip install -e ".[vis]" ``` # Demo ## Streaming Inference from Images ```bash python demo.py --model_path /path/to/checkpoint.pt \ --image_folder /path/to/images/ ``` ## Streaming Inference from Video ```bash python demo.py --model_path /path/to/checkpoint.pt \ --video_path video.mp4 --fps 10 ``` ## Streaming with Keyframe Interval Use `--keyframe_interval` to reduce KV cache memory by only keeping every N-th frame as a keyframe. Non-keyframe frames still produce predictions but are not stored in the cache. This is useful for long sequences which excesses 320 frames. ```bash python demo.py --model_path /path/to/checkpoint.pt \ --image_folder /path/to/images/ --keyframe_interval 6 ``` ## Windowed Inference (for long sequences, >3000 frames) ```bash python demo.py --model_path /path/to/checkpoint.pt \ --video_path video.mp4 --fps 10 \ --mode windowed --window_size 64 ``` ## With Sky Masking ```bash python demo.py --model_path /path/to/checkpoint.pt \ --image_folder /path/to/images/ --mask_sky ``` ## Without FlashInfer (SDPA fallback) ```bash python demo.py --model_path /path/to/checkpoint.pt \ --image_folder /path/to/images/ --use_sdpa ``` # Model Download | Model Name | Huggingface Repository | Description | | :--- | :--- | :--- | | lingbot-map | [robbyant/lingbot-map](https://huggingface.co/robbyant/lingbot-map) | Base model checkpoint (4.63 GB) | # License This project is released under the Apache License 2.0. See [LICENSE](LICENSE.txt) file for details. # Citation ```bibtex @article{lingbot-map2026, title={}, author={}, journal={arXiv preprint arXiv:}, year={2026} } ``` # Acknowledgments This work builds upon several excellent open-source projects: - [VGGT](https://github.com/facebookresearch/vggt) - [DINOv2](https://github.com/facebookresearch/dinov2) - [Flashinfer](https://github.com/flashinfer-ai/flashinfer) ---