SurroundNEXO

Ego-Centric Metric Bridging for Spatially Consistent Geometry in Autonomous Driving

Shuai Yuan1, 2, * Runxi Tang1 Yuzhou Ji1 Fudong Ge1, 2 Hanshi Wang1, 2 Yifei Wang2 Xianming Zeng2 Jianyun Xu2 Xingliang Liu2 Yanfeng Wang1 Zhipeng Zhang1,†

1School of Artificial Intelligence, Shanghai Jiao Tong University
2Hello Inc.
Corresponding author

SurroundNEXO teaser

SurroundNEXO bridges low-overlap surround cameras through an ego-centric metric reference instead of relying on dense visual correspondences.

Abstract

Modern autonomous driving requires metric 3D understanding that is accurate within each camera and spatially consistent across a surround-view rig. SurroundNEXO grounds cross-view reasoning in ego-centric geometry, uses sparse LiDAR observations as metric anchors, and progressively expands feature interaction from view-local modeling to global integration.

33.2%Single-view Accuracy
10.5%Cross-view Consistency
25.6%3D Reconstruction Quality

Method

Given surround-view images, camera rigs, and sparse LiDAR, SurroundNEXO grounds images with ego-frame ray priors and sparse metric anchors, then progressively expands feature interaction to produce metric and spatially consistent depth predictions.

SurroundNEXO pipeline

ERPE

Ego-Ray Positional Encoding maps image patches to globally comparable ego-frame viewing directions.

SMA

Sparse Metric Anchoring propagates absolute scale from sparse LiDAR observations to dense visual tokens.

PGT

Progressive Geometry Transformer schedules interaction from intra-view to cross-view, cross-frame, and global reasoning.

Results

Across depth accuracy, cross-view consistency, sparse-prompt robustness, and 3D reconstruction, SurroundNEXO consistently improves metric geometry in low-overlap autonomous driving scenes.

Method Alignment Waymo NuScenes KITTI DDAD OpenScene
Abs Rel ↓δ<1.25 ↑ Abs Rel ↓δ<1.25 ↑ Abs Rel ↓δ<1.25 ↑ Abs Rel ↓δ<1.25 ↑ Abs Rel ↓δ<1.25 ↑
Single-view
Depth-ProMetric0.3900.2300.2360.5220.1350.8760.4500.1760.2590.477
UniDepth-2Metric0.1470.8290.1200.9340.0710.9620.0910.9410.1420.913
MoGe-2Metric0.1770.6300.1290.8510.2400.2730.1240.7960.0940.937
MetricAny.Metric0.0970.9430.1600.8510.1510.7100.0850.9440.1520.860
Multi-view
VGGTScale0.1340.8570.2130.7730.0920.9410.1850.7070.1780.808
Pi3Scale0.0960.9140.1220.8690.0560.9720.1390.8020.1280.889
OmniVGGTScale0.1040.9250.1940.7620.0950.9270.1950.6820.2670.814
MapAny.Metric0.0960.9110.1150.8790.0960.9220.1140.8670.1150.870
DVGTMetric0.2270.6150.1270.8710.0680.9690.0970.9220.1100.909
DA3-GScale0.1720.7890.1470.8270.0630.9690.1420.8120.1320.875
DA3-G†Metric0.0820.9370.1120.9110.1020.9290.1140.9060.1050.916
OursMetric0.0480.9750.0790.9500.0700.9690.0770.9560.0840.938
Method NuScenes DDAD
Abs Rel ↓Sq Rel ↓RMSE ↓δ<1.25 ↑ Abs Rel ↓Sq Rel ↓RMSE ↓δ<1.25 ↑
Single-view
Depth-Pro0.3533.4826.4610.4760.6256.6547.7720.276
UniDepth-20.1982.6916.2160.3410.1962.2286.2050.763
MoGe-20.2352.6836.2810.6800.3473.5987.2740.544
MetricAny.0.2082.5126.1560.7280.1982.3405.8340.799
Multi-view
VGGT0.2202.2725.6080.6600.3032.5225.6800.629
Pi30.2112.3875.9590.7070.3213.5716.8650.604
OmniVGGT0.2101.9774.6710.6600.2692.0835.3360.562
MapAny.0.3285.3487.2120.7410.4908.8898.4280.682
DVGT0.2292.6295.9490.7050.2412.2755.9940.712
DA3-G0.2172.1115.4430.6840.3143.6335.9450.576
DA3-G†0.1731.8975.1320.8010.2382.2185.5890.769
Ours0.1391.6544.5270.8500.1621.8405.3330.837
Method Waymo NuScenes
Avg. Lidar-4 Random-0.1% Avg. Lidar-4 Random-0.1%
Abs Rel ↓RMSE ↓ Abs Rel ↓RMSE ↓ Abs Rel ↓RMSE ↓ Abs Rel ↓RMSE ↓ Abs Rel ↓RMSE ↓ Abs Rel ↓RMSE ↓
Omni-DC0.1334.2980.2116.5340.2085.5920.2876.5550.1584.1630.75415.056
PromptDA0.49514.2710.50314.3340.45312.7850.50712.0470.50711.7170.53813.331
PriorDA0.0542.2580.0572.4550.0532.3360.0842.5670.0752.2800.1393.976
LB-Depth0.1264.4330.0693.5440.3147.7800.1735.2800.1194.9370.3276.560
Any2Full0.0902.8500.1172.4840.3256.7190.5426.9460.0802.7541.92719.469
Ours0.0531.9460.0541.9890.0491.8430.0872.0050.0831.8930.1032.343
Method Alignment Waymo NuScenes KITTI DDAD OpenScene Time
Acc. ↓Comp. ↓ Acc. ↓Comp. ↓ Acc. ↓Comp. ↓ Acc. ↓Comp. ↓ Acc. ↓Comp. ↓
VGGTScale0.5540.2900.6030.3460.1820.2580.4100.2720.2460.219~55.62s
Pi3Scale0.5850.2580.4000.2490.1440.1110.3100.1720.2490.171~35.08s
OmniVGGTScale0.5250.3870.5260.4300.1760.3260.4090.5130.2680.432~45.59s
DVGTMetric0.7560.4811.0780.8080.5220.7430.3710.7050.3060.381~19.73s
DA3-GScale0.3330.2160.3600.2750.1550.1280.2520.2120.2060.222~31.79s
DA3-G†Metric0.3350.2100.3810.2560.2220.1560.2950.1950.2700.244~31.79s
MapAny.Metric0.2880.3610.3300.2010.1250.1000.3610.1500.1980.190~34.49s
OursMetric0.2230.1330.2810.1730.1700.1250.2210.1280.1450.116~25.12s

Qualitive Results

Interactive point-cloud reconstructions compare DA3, MoGe-2, and SurroundNEXO across diverse surround-view driving scenes, highlighting cross-view geometry and metric consistency.

Loading point cloud...

Click preset buttons below to jump to specific views.

BibTeX

@article{yuan2026surroundnexo,
  title   = {SurroundNEXO: Ego-Centric Metric Bridging for Spatially Consistent Geometry in Autonomous Driving},
  author  = author={Yuan, Shuai and Tang, Runxi and Ji, Yuzhou and Ge, Fudong and Wang, Hanshi and Wang, Yifei and Zeng, Xianming and Xu, Jianyun and Liu, Xingliang and Wang, Yanfeng and Zhang, Zhipeng},
  journal = {arXiv preprint arXiv:2606.16960},
  year    = {2026}
}