diff --git a/.gitattributes b/.gitattributes index a6344aa..596330e 100644 --- a/.gitattributes +++ b/.gitattributes @@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text *.zip filter=lfs diff=lfs merge=lfs -text *.zst filter=lfs diff=lfs merge=lfs -text *tfevents* filter=lfs diff=lfs merge=lfs -text +assets/teaser.png filter=lfs diff=lfs merge=lfs -text diff --git a/README.md b/README.md index 7b95401..3d83d4c 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,143 @@ ---- -license: apache-2.0 ---- +

LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction

+ +

+ + + + +

+ +

+ +

+ +

+ +

+ +--- + +# Quick Start + +## Installation + +**1. Create conda environment** + +```bash +conda create -n lingbot-map python=3.10 -y +conda activate lingbot-map +``` + +**2. Install PyTorch (CUDA 12.8)** + +```bash +pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128 +``` + +> For other CUDA versions, see [PyTorch Get Started](https://pytorch.org/get-started/locally/). + +**3. Install lingbot-map** + +```bash +pip install -e . +``` + +**4. Install FlashInfer (recommended)** + +FlashInfer provides paged KV cache attention for efficient streaming inference: + +```bash +# CUDA 12.8 + PyTorch 2.9 +pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/ +``` + +> For other CUDA/PyTorch combinations, see [FlashInfer installation](https://docs.flashinfer.ai/installation.html). +> If FlashInfer is not installed, the model falls back to SDPA (PyTorch native attention) via `--use_sdpa`. + +**5. Visualization dependencies (optional)** + +```bash +pip install -e ".[vis]" +``` + +# Demo + +## Streaming Inference from Images + +```bash +python demo.py --model_path /path/to/checkpoint.pt \ + --image_folder /path/to/images/ +``` + +## Streaming Inference from Video + +```bash +python demo.py --model_path /path/to/checkpoint.pt \ + --video_path video.mp4 --fps 10 +``` + +## Streaming with Keyframe Interval + +Use `--keyframe_interval` to reduce KV cache memory by only keeping every N-th frame as a keyframe. Non-keyframe frames still produce predictions but are not stored in the cache. This is useful for long sequences +which excesses 320 frames. + +```bash +python demo.py --model_path /path/to/checkpoint.pt \ + --image_folder /path/to/images/ --keyframe_interval 6 +``` + +## Windowed Inference (for long sequences, >3000 frames) +```bash +python demo.py --model_path /path/to/checkpoint.pt \ + --video_path video.mp4 --fps 10 \ + --mode windowed --window_size 64 +``` + + +## With Sky Masking + +```bash +python demo.py --model_path /path/to/checkpoint.pt \ + --image_folder /path/to/images/ --mask_sky +``` + +## Without FlashInfer (SDPA fallback) + +```bash +python demo.py --model_path /path/to/checkpoint.pt \ + --image_folder /path/to/images/ --use_sdpa +``` + +# Model Download + + + +| Model Name | Huggingface Repository | Description | +| :--- | :--- | :--- | +| lingbot-map | | Base model checkpoint | + + +# License + +This project is released under the Apache License 2.0. See [LICENSE](LICENSE.txt) file for details. + +# Citation + +```bibtex +@article{lingbot-map2026, + title={}, + author={}, + journal={arXiv preprint arXiv:}, + year={2026} +} +``` + +# Acknowledgments + +This work builds upon several excellent open-source projects: + +- [VGGT](https://github.com/facebookresearch/vggt) +- [DINOv2](https://github.com/facebookresearch/dinov2) +- [Flashinfer](https://github.com/flashinfer-ai/flashinfer) + +--- \ No newline at end of file diff --git a/assets/teaser.png b/assets/teaser.png new file mode 100644 index 0000000..77b6314 --- /dev/null +++ b/assets/teaser.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d34377bdb2f0747442f3113692914e669e97cb1d474578711cc30d08c5618bcc +size 5109745 diff --git a/lingbot-map.pt b/lingbot-map.pt new file mode 100644 index 0000000..6e9d3c8 --- /dev/null +++ b/lingbot-map.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:986579f63db7bde3cb0f0ecc0a8fd49f5e4b6141a178ac33598d7fbe3e901cd0 +size 4632326476