4.3 KiB
LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction
https://github.com/user-attachments/assets/fe39e095-af2c-4ec9-b68d-a8ba97e505ab
Quick Start
Installation
1. Create conda environment
conda create -n lingbot-map python=3.10 -y
conda activate lingbot-map
2. Install PyTorch (CUDA 12.8)
pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128
For other CUDA versions, see PyTorch Get Started.
3. Install lingbot-map
pip install -e .
4. Install FlashInfer (recommended)
FlashInfer provides paged KV cache attention for efficient streaming inference:
# CUDA 12.8 + PyTorch 2.9
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
For other CUDA/PyTorch combinations, see FlashInfer installation. If FlashInfer is not installed, the model falls back to SDPA (PyTorch native attention) via
--use_sdpa.
5. Visualization dependencies (optional)
pip install -e ".[vis]"
Model Download
| Model Name | Huggingface Repository | ModelScope Repository | Description |
|---|---|---|---|
| lingbot-map | robbyant/lingbot-map | Robbyant/lingbot-map | Base model checkpoint (4.63 GB) |
Demo
Streaming Inference from Images
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder /path/to/images/
Streaming Inference from Video
python demo.py --model_path /path/to/checkpoint.pt \
--video_path video.mp4 --fps 10
Streaming with Keyframe Interval
Use --keyframe_interval to reduce KV cache memory by only keeping every N-th frame as a keyframe. Non-keyframe frames still produce predictions but are not stored in the cache. This is useful for long sequences
which excesses 320 frames.
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder /path/to/images/ --keyframe_interval 6
Windowed Inference (for long sequences, >3000 frames)
python demo.py --model_path /path/to/checkpoint.pt \
--video_path video.mp4 --fps 10 \
--mode windowed --window_size 64
With Sky Masking
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder /path/to/images/ --mask_sky
Without FlashInfer (SDPA fallback)
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder /path/to/images/ --use_sdpa
License
This project is released under the Apache License 2.0. See LICENSE file for details.
Citation
@article{chen2026geometric,
title={Geometric Context Transformer for Streaming 3D Reconstruction},
author={Chen, Lin-Zhuo and Gao, Jian and Chen, Yihang and Cheng, Ka Leong and Sun, Yipengjing and Hu, Liangxiao and Xue, Nan and Zhu, Xing and Shen, Yujun and Yao, Yao and Xu, Yinghao},
journal={arXiv preprint arXiv:2604.14141},
year={2026}
}
Acknowledgments
This work builds upon several excellent open-source projects:
