From fff96f1b501d88cb01f3a83c2507e7a4f3278b5f Mon Sep 17 00:00:00 2001 From: LIn-Zhuo Chen Date: Thu, 16 Apr 2026 07:30:46 +0000 Subject: [PATCH] Upload folder using huggingface_hub --- README.md | 109 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 70 insertions(+), 39 deletions(-) diff --git a/README.md b/README.md index 3d83d4c..b507a28 100644 --- a/README.md +++ b/README.md @@ -1,23 +1,38 @@ -

LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction

- -

- - - - -

- -

+

-

-

- -

+

LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction

+ +Robbyant Team + +
+ +
+ +[![Paper](https://img.shields.io/static/v1?label=Paper&message=arXiv&color=red&logo=arxiv)](https://arxiv.org/abs/2604.14141) +[![PDF](https://img.shields.io/static/v1?label=Paper&message=PDF&color=red&logo=adobeacrobatreader)](lingbot-map_paper.pdf) +[![Project](https://img.shields.io/badge/Project-Website-blue)](https://technology.robbyant.com/lingbot-map) +[![HuggingFace](https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Model&message=HuggingFace&color=orange)](https://huggingface.co/robbyant/lingbot-map) +[![ModelScope](https://img.shields.io/static/v1?label=%F0%9F%A4%96%20Model&message=ModelScope&color=purple)](https://www.modelscope.cn/models/Robbyant/lingbot-map) +[![License](https://img.shields.io/badge/License-Apache--2.0-green)](LICENSE.txt) + +
+ +https://github.com/user-attachments/assets/fe39e095-af2c-4ec9-b68d-a8ba97e505ab + +----- + +### πŸ—ΊοΈ Meet LingBot-Map! We've built a feed-forward 3D foundation model for streaming 3D reconstruction! πŸ—οΈπŸŒ + +LingBot-Map has focused on: + +- **Geometric Context Transformer**: Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory. +- **High-Efficiency Streaming Inference**: A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518Γ—378 resolution over long sequences exceeding 10,000 frames. +- **State-of-the-Art Reconstruction**: Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches. --- -# Quick Start +# βš™οΈ Quick Start ## Installation @@ -60,23 +75,29 @@ pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/ pip install -e ".[vis]" ``` -# Demo +# πŸ“¦ Model Download -## Streaming Inference from Images +| Model Name | Huggingface Repository | ModelScope Repository | Description | +| :--- | :--- | :--- | :--- | +| lingbot-map | [robbyant/lingbot-map](https://huggingface.co/robbyant/lingbot-map) | [Robbyant/lingbot-map](https://www.modelscope.cn/models/Robbyant/lingbot-map) | Base model checkpoint (4.63 GB) | + +# 🎬 Demo + +### Streaming Inference from Images ```bash python demo.py --model_path /path/to/checkpoint.pt \ --image_folder /path/to/images/ ``` -## Streaming Inference from Video +### Streaming Inference from Video ```bash python demo.py --model_path /path/to/checkpoint.pt \ --video_path video.mp4 --fps 10 ``` -## Streaming with Keyframe Interval +### Streaming with Keyframe Interval Use `--keyframe_interval` to reduce KV cache memory by only keeping every N-th frame as a keyframe. Non-keyframe frames still produce predictions but are not stored in the cache. This is useful for long sequences which excesses 320 frames. @@ -86,7 +107,7 @@ python demo.py --model_path /path/to/checkpoint.pt \ --image_folder /path/to/images/ --keyframe_interval 6 ``` -## Windowed Inference (for long sequences, >3000 frames) +### Windowed Inference (for long sequences, >3000 frames) ```bash python demo.py --model_path /path/to/checkpoint.pt \ --video_path video.mp4 --fps 10 \ @@ -94,45 +115,55 @@ python demo.py --model_path /path/to/checkpoint.pt \ ``` -## With Sky Masking +### Sky Masking + +Sky masking uses an ONNX sky segmentation model to filter out sky points from the reconstructed point cloud, which improves visualization quality for outdoor scenes. + +**Setup:** + +```bash +# Install onnxruntime (required) +pip install onnxruntime # CPU +# or +pip install onnxruntime-gpu # GPU (faster for large image sets) +``` + +The sky segmentation model (`skyseg.onnx`) will be automatically downloaded from [HuggingFace](https://huggingface.co/JianyuanWang/skyseg/resolve/main/skyseg.onnx) on first use. + +**Usage:** ```bash python demo.py --model_path /path/to/checkpoint.pt \ --image_folder /path/to/images/ --mask_sky ``` -## Without FlashInfer (SDPA fallback) +Sky masks are cached in `_sky_masks/` so subsequent runs skip regeneration. + +### Without FlashInfer (SDPA fallback) ```bash python demo.py --model_path /path/to/checkpoint.pt \ --image_folder /path/to/images/ --use_sdpa ``` -# Model Download - - - -| Model Name | Huggingface Repository | Description | -| :--- | :--- | :--- | -| lingbot-map | | Base model checkpoint | - - -# License +# πŸ“œ License This project is released under the Apache License 2.0. See [LICENSE](LICENSE.txt) file for details. -# Citation +# πŸ“– Citation ```bibtex -@article{lingbot-map2026, - title={}, - author={}, - journal={arXiv preprint arXiv:}, +@article{chen2026geometric, + title={Geometric Context Transformer for Streaming 3D Reconstruction}, + author={Chen, Lin-Zhuo and Gao, Jian and Chen, Yihang and Cheng, Ka Leong and Sun, Yipengjing and Hu, Liangxiao and Xue, Nan and Zhu, Xing and Shen, Yujun and Yao, Yao and Xu, Yinghao}, + journal={arXiv preprint arXiv:2604.14141}, year={2026} } ``` -# Acknowledgments +# ✨ Acknowledgments + +We thank Shangzhan Zhang, Jianyuan Wang, Yudong Jin, Christian Rupprecht, and Xun Cao for their helpful discussions and support. This work builds upon several excellent open-source projects: @@ -140,4 +171,4 @@ This work builds upon several excellent open-source projects: - [DINOv2](https://github.com/facebookresearch/dinov2) - [Flashinfer](https://github.com/flashinfer-ai/flashinfer) ---- \ No newline at end of file +---