floppyrj45/lingbot-map

Fork 0

Go to file

justimyhxu cbcd7e8aaf rm teaser.mp4

2026-04-16 10:54:59 +08:00

assets

rm teaser.mp4

2026-04-16 10:54:59 +08:00

docs

first commit

2026-04-16 09:51:30 +08:00

lingbot_map

first commit

2026-04-16 09:51:30 +08:00

.DS_Store

first commit

2026-04-16 09:51:30 +08:00

.gitignore

first commit

2026-04-16 09:51:30 +08:00

demo.py

first commit

2026-04-16 09:51:30 +08:00

LICENSE.txt

first commit

2026-04-16 09:51:30 +08:00

lingbot-map_paper.pdf

first commit

2026-04-16 09:51:30 +08:00

pyproject.toml

first commit

2026-04-16 09:51:30 +08:00

README.md

Update README to remove video and add link

2026-04-16 10:54:22 +08:00

README.md

LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction

https://github.com/user-attachments/assets/fe39e095-af2c-4ec9-b68d-a8ba97e505ab

Quick Start

Installation

1. Create conda environment

conda create -n lingbot-map python=3.10 -y
conda activate lingbot-map

2. Install PyTorch (CUDA 12.8)

pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128

For other CUDA versions, see PyTorch Get Started.

3. Install lingbot-map

pip install -e .

4. Install FlashInfer (recommended)

FlashInfer provides paged KV cache attention for efficient streaming inference:

# CUDA 12.8 + PyTorch 2.9
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/

For other CUDA/PyTorch combinations, see FlashInfer installation. If FlashInfer is not installed, the model falls back to SDPA (PyTorch native attention) via --use_sdpa.

5. Visualization dependencies (optional)

pip install -e ".[vis]"

Model Download

Model Name	Huggingface Repository	ModelScope Repository	Description
lingbot-map	robbyant/lingbot-map	Robbyant/lingbot-map	Base model checkpoint (4.63 GB)

Demo

Streaming Inference from Images

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/

Streaming Inference from Video

python demo.py --model_path /path/to/checkpoint.pt \
    --video_path video.mp4 --fps 10

Streaming with Keyframe Interval

Use --keyframe_interval to reduce KV cache memory by only keeping every N-th frame as a keyframe. Non-keyframe frames still produce predictions but are not stored in the cache. This is useful for long sequences which excesses 320 frames.

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --keyframe_interval 6

Windowed Inference (for long sequences, >3000 frames)

python demo.py --model_path /path/to/checkpoint.pt \
    --video_path video.mp4 --fps 10 \
    --mode windowed --window_size 64

With Sky Masking

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --mask_sky

Without FlashInfer (SDPA fallback)

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --use_sdpa

License

This project is released under the Apache License 2.0. See LICENSE file for details.

Citation

@article{chen2026geometric,
  title={Geometric Context Transformer for Streaming 3D Reconstruction},
  author={Chen, Lin-Zhuo and Gao, Jian and Chen, Yihang and Cheng, Ka Leong and Sun, Yipengjing and Hu, Liangxiao and Xue, Nan and Zhu, Xing and Shen, Yujun and Yao, Yao and Xu, Yinghao},
  journal={arXiv preprint arXiv:2604.14141},
  year={2026}
}

Acknowledgments

This work builds upon several excellent open-source projects: