ERPE
Ego-Ray Positional Encoding maps image patches to globally comparable ego-frame viewing directions.
Modern autonomous driving requires metric 3D understanding that is accurate within each camera and spatially consistent across a surround-view rig. SurroundNEXO grounds cross-view reasoning in ego-centric geometry, uses sparse LiDAR observations as metric anchors, and progressively expands feature interaction from view-local modeling to global integration.
Given surround-view images, camera rigs, and sparse LiDAR, SurroundNEXO grounds images with ego-frame ray priors and sparse metric anchors, then progressively expands feature interaction to produce metric and spatially consistent depth predictions.
Ego-Ray Positional Encoding maps image patches to globally comparable ego-frame viewing directions.
Sparse Metric Anchoring propagates absolute scale from sparse LiDAR observations to dense visual tokens.
Progressive Geometry Transformer schedules interaction from intra-view to cross-view, cross-frame, and global reasoning.
Across depth accuracy, cross-view consistency, sparse-prompt robustness, and 3D reconstruction, SurroundNEXO consistently improves metric geometry in low-overlap autonomous driving scenes.
| Method | Alignment | Waymo | NuScenes | KITTI | DDAD | OpenScene | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Abs Rel ↓ | δ<1.25 ↑ | Abs Rel ↓ | δ<1.25 ↑ | Abs Rel ↓ | δ<1.25 ↑ | Abs Rel ↓ | δ<1.25 ↑ | Abs Rel ↓ | δ<1.25 ↑ | ||
| Single-view | |||||||||||
| Depth-Pro | Metric | 0.390 | 0.230 | 0.236 | 0.522 | 0.135 | 0.876 | 0.450 | 0.176 | 0.259 | 0.477 |
| UniDepth-2 | Metric | 0.147 | 0.829 | 0.120 | 0.934 | 0.071 | 0.962 | 0.091 | 0.941 | 0.142 | 0.913 |
| MoGe-2 | Metric | 0.177 | 0.630 | 0.129 | 0.851 | 0.240 | 0.273 | 0.124 | 0.796 | 0.094 | 0.937 |
| MetricAny. | Metric | 0.097 | 0.943 | 0.160 | 0.851 | 0.151 | 0.710 | 0.085 | 0.944 | 0.152 | 0.860 |
| Multi-view | |||||||||||
| VGGT | Scale | 0.134 | 0.857 | 0.213 | 0.773 | 0.092 | 0.941 | 0.185 | 0.707 | 0.178 | 0.808 |
| Pi3 | Scale | 0.096 | 0.914 | 0.122 | 0.869 | 0.056 | 0.972 | 0.139 | 0.802 | 0.128 | 0.889 |
| OmniVGGT | Scale | 0.104 | 0.925 | 0.194 | 0.762 | 0.095 | 0.927 | 0.195 | 0.682 | 0.267 | 0.814 |
| MapAny. | Metric | 0.096 | 0.911 | 0.115 | 0.879 | 0.096 | 0.922 | 0.114 | 0.867 | 0.115 | 0.870 |
| DVGT | Metric | 0.227 | 0.615 | 0.127 | 0.871 | 0.068 | 0.969 | 0.097 | 0.922 | 0.110 | 0.909 |
| DA3-G | Scale | 0.172 | 0.789 | 0.147 | 0.827 | 0.063 | 0.969 | 0.142 | 0.812 | 0.132 | 0.875 |
| DA3-G† | Metric | 0.082 | 0.937 | 0.112 | 0.911 | 0.102 | 0.929 | 0.114 | 0.906 | 0.105 | 0.916 |
| Ours | Metric | 0.048 | 0.975 | 0.079 | 0.950 | 0.070 | 0.969 | 0.077 | 0.956 | 0.084 | 0.938 |
| Method | NuScenes | DDAD | ||||||
|---|---|---|---|---|---|---|---|---|
| Abs Rel ↓ | Sq Rel ↓ | RMSE ↓ | δ<1.25 ↑ | Abs Rel ↓ | Sq Rel ↓ | RMSE ↓ | δ<1.25 ↑ | |
| Single-view | ||||||||
| Depth-Pro | 0.353 | 3.482 | 6.461 | 0.476 | 0.625 | 6.654 | 7.772 | 0.276 |
| UniDepth-2 | 0.198 | 2.691 | 6.216 | 0.341 | 0.196 | 2.228 | 6.205 | 0.763 |
| MoGe-2 | 0.235 | 2.683 | 6.281 | 0.680 | 0.347 | 3.598 | 7.274 | 0.544 |
| MetricAny. | 0.208 | 2.512 | 6.156 | 0.728 | 0.198 | 2.340 | 5.834 | 0.799 |
| Multi-view | ||||||||
| VGGT | 0.220 | 2.272 | 5.608 | 0.660 | 0.303 | 2.522 | 5.680 | 0.629 |
| Pi3 | 0.211 | 2.387 | 5.959 | 0.707 | 0.321 | 3.571 | 6.865 | 0.604 |
| OmniVGGT | 0.210 | 1.977 | 4.671 | 0.660 | 0.269 | 2.083 | 5.336 | 0.562 |
| MapAny. | 0.328 | 5.348 | 7.212 | 0.741 | 0.490 | 8.889 | 8.428 | 0.682 |
| DVGT | 0.229 | 2.629 | 5.949 | 0.705 | 0.241 | 2.275 | 5.994 | 0.712 |
| DA3-G | 0.217 | 2.111 | 5.443 | 0.684 | 0.314 | 3.633 | 5.945 | 0.576 |
| DA3-G† | 0.173 | 1.897 | 5.132 | 0.801 | 0.238 | 2.218 | 5.589 | 0.769 |
| Ours | 0.139 | 1.654 | 4.527 | 0.850 | 0.162 | 1.840 | 5.333 | 0.837 |
| Method | Waymo | NuScenes | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Avg. | Lidar-4 | Random-0.1% | Avg. | Lidar-4 | Random-0.1% | |||||||
| Abs Rel ↓ | RMSE ↓ | Abs Rel ↓ | RMSE ↓ | Abs Rel ↓ | RMSE ↓ | Abs Rel ↓ | RMSE ↓ | Abs Rel ↓ | RMSE ↓ | Abs Rel ↓ | RMSE ↓ | |
| Omni-DC | 0.133 | 4.298 | 0.211 | 6.534 | 0.208 | 5.592 | 0.287 | 6.555 | 0.158 | 4.163 | 0.754 | 15.056 |
| PromptDA | 0.495 | 14.271 | 0.503 | 14.334 | 0.453 | 12.785 | 0.507 | 12.047 | 0.507 | 11.717 | 0.538 | 13.331 |
| PriorDA | 0.054 | 2.258 | 0.057 | 2.455 | 0.053 | 2.336 | 0.084 | 2.567 | 0.075 | 2.280 | 0.139 | 3.976 |
| LB-Depth | 0.126 | 4.433 | 0.069 | 3.544 | 0.314 | 7.780 | 0.173 | 5.280 | 0.119 | 4.937 | 0.327 | 6.560 |
| Any2Full | 0.090 | 2.850 | 0.117 | 2.484 | 0.325 | 6.719 | 0.542 | 6.946 | 0.080 | 2.754 | 1.927 | 19.469 |
| Ours | 0.053 | 1.946 | 0.054 | 1.989 | 0.049 | 1.843 | 0.087 | 2.005 | 0.083 | 1.893 | 0.103 | 2.343 |
| Method | Alignment | Waymo | NuScenes | KITTI | DDAD | OpenScene | Time | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc. ↓ | Comp. ↓ | Acc. ↓ | Comp. ↓ | Acc. ↓ | Comp. ↓ | Acc. ↓ | Comp. ↓ | Acc. ↓ | Comp. ↓ | |||
| VGGT | Scale | 0.554 | 0.290 | 0.603 | 0.346 | 0.182 | 0.258 | 0.410 | 0.272 | 0.246 | 0.219 | ~55.62s |
| Pi3 | Scale | 0.585 | 0.258 | 0.400 | 0.249 | 0.144 | 0.111 | 0.310 | 0.172 | 0.249 | 0.171 | ~35.08s |
| OmniVGGT | Scale | 0.525 | 0.387 | 0.526 | 0.430 | 0.176 | 0.326 | 0.409 | 0.513 | 0.268 | 0.432 | ~45.59s |
| DVGT | Metric | 0.756 | 0.481 | 1.078 | 0.808 | 0.522 | 0.743 | 0.371 | 0.705 | 0.306 | 0.381 | ~19.73s |
| DA3-G | Scale | 0.333 | 0.216 | 0.360 | 0.275 | 0.155 | 0.128 | 0.252 | 0.212 | 0.206 | 0.222 | ~31.79s |
| DA3-G† | Metric | 0.335 | 0.210 | 0.381 | 0.256 | 0.222 | 0.156 | 0.295 | 0.195 | 0.270 | 0.244 | ~31.79s |
| MapAny. | Metric | 0.288 | 0.361 | 0.330 | 0.201 | 0.125 | 0.100 | 0.361 | 0.150 | 0.198 | 0.190 | ~34.49s |
| Ours | Metric | 0.223 | 0.133 | 0.281 | 0.173 | 0.170 | 0.125 | 0.221 | 0.128 | 0.145 | 0.116 | ~25.12s |
Interactive point-cloud reconstructions compare DA3, MoGe-2, and SurroundNEXO across diverse surround-view driving scenes, highlighting cross-view geometry and metric consistency.
Click preset buttons below to jump to specific views.
@article{yuan2026surroundnexo,
title = {SurroundNEXO: Ego-Centric Metric Bridging for Spatially Consistent Geometry in Autonomous Driving},
author = author={Yuan, Shuai and Tang, Runxi and Ji, Yuzhou and Ge, Fudong and Wang, Hanshi and Wang, Yifei and Zeng, Xianming and Xu, Jianyun and Liu, Xingliang and Wang, Yanfeng and Zhang, Zhipeng},
journal = {arXiv preprint arXiv:2606.16960},
year = {2026}
}