SurroundNEXO

Ego-Centric Metric Bridging for Spatially Consistent Geometry in Autonomous Driving

Shuai Yuan^{1, 2, *} Runxi Tang¹ Yuzhou Ji¹ Fudong Ge^{1, 2} Hanshi Wang^{1, 2} Yifei Wang² Xianming Zeng² Jianyun Xu² Xingliang Liu² Yanfeng Wang¹ Zhipeng Zhang^1,†

¹School of Artificial Intelligence, Shanghai Jiao Tong University
²Hello Inc.
^†Corresponding author

Paper Code

SurroundNEXO bridges low-overlap surround cameras through an ego-centric metric reference instead of relying on dense visual correspondences.

Abstract

Modern autonomous driving requires metric 3D understanding that is accurate within each camera and spatially consistent across a surround-view rig. SurroundNEXO grounds cross-view reasoning in ego-centric geometry, uses sparse LiDAR observations as metric anchors, and progressively expands feature interaction from view-local modeling to global integration.

33.2%↑Single-view Accuracy

10.5%↑Cross-view Consistency

25.6%↑3D Reconstruction Quality

Method

Given surround-view images, camera rigs, and sparse LiDAR, SurroundNEXO grounds images with ego-frame ray priors and sparse metric anchors, then progressively expands feature interaction to produce metric and spatially consistent depth predictions.

ERPE

Ego-Ray Positional Encoding maps image patches to globally comparable ego-frame viewing directions.

SMA

Sparse Metric Anchoring propagates absolute scale from sparse LiDAR observations to dense visual tokens.

PGT

Progressive Geometry Transformer schedules interaction from intra-view to cross-view, cross-frame, and global reasoning.

Results

Across depth accuracy, cross-view consistency, sparse-prompt robustness, and 3D reconstruction, SurroundNEXO consistently improves metric geometry in low-overlap autonomous driving scenes.

Method	Alignment	Waymo		NuScenes		KITTI		DDAD		OpenScene
Method	Alignment	Abs Rel ↓	δ<1.25 ↑	Abs Rel ↓	δ<1.25 ↑	Abs Rel ↓	δ<1.25 ↑	Abs Rel ↓	δ<1.25 ↑	Abs Rel ↓	δ<1.25 ↑
Single-view
Depth-Pro	Metric	0.390	0.230	0.236	0.522	0.135	0.876	0.450	0.176	0.259	0.477
UniDepth-2	Metric	0.147	0.829	0.120	0.934	0.071	0.962	0.091	0.941	0.142	0.913
MoGe-2	Metric	0.177	0.630	0.129	0.851	0.240	0.273	0.124	0.796	0.094	0.937
MetricAny.	Metric	0.097	0.943	0.160	0.851	0.151	0.710	0.085	0.944	0.152	0.860
Multi-view
VGGT	Scale	0.134	0.857	0.213	0.773	0.092	0.941	0.185	0.707	0.178	0.808
Pi3	Scale	0.096	0.914	0.122	0.869	0.056	0.972	0.139	0.802	0.128	0.889
OmniVGGT	Scale	0.104	0.925	0.194	0.762	0.095	0.927	0.195	0.682	0.267	0.814
MapAny.	Metric	0.096	0.911	0.115	0.879	0.096	0.922	0.114	0.867	0.115	0.870
DVGT	Metric	0.227	0.615	0.127	0.871	0.068	0.969	0.097	0.922	0.110	0.909
DA3-G	Scale	0.172	0.789	0.147	0.827	0.063	0.969	0.142	0.812	0.132	0.875
DA3-G†	Metric	0.082	0.937	0.112	0.911	0.102	0.929	0.114	0.906	0.105	0.916
Ours	Metric	0.048	0.975	0.079	0.950	0.070	0.969	0.077	0.956	0.084	0.938

Method	NuScenes				DDAD
Method	Abs Rel ↓	Sq Rel ↓	RMSE ↓	δ<1.25 ↑	Abs Rel ↓	Sq Rel ↓	RMSE ↓	δ<1.25 ↑
Single-view
Depth-Pro	0.353	3.482	6.461	0.476	0.625	6.654	7.772	0.276
UniDepth-2	0.198	2.691	6.216	0.341	0.196	2.228	6.205	0.763
MoGe-2	0.235	2.683	6.281	0.680	0.347	3.598	7.274	0.544
MetricAny.	0.208	2.512	6.156	0.728	0.198	2.340	5.834	0.799
Multi-view
VGGT	0.220	2.272	5.608	0.660	0.303	2.522	5.680	0.629
Pi3	0.211	2.387	5.959	0.707	0.321	3.571	6.865	0.604
OmniVGGT	0.210	1.977	4.671	0.660	0.269	2.083	5.336	0.562
MapAny.	0.328	5.348	7.212	0.741	0.490	8.889	8.428	0.682
DVGT	0.229	2.629	5.949	0.705	0.241	2.275	5.994	0.712
DA3-G	0.217	2.111	5.443	0.684	0.314	3.633	5.945	0.576
DA3-G†	0.173	1.897	5.132	0.801	0.238	2.218	5.589	0.769
Ours	0.139	1.654	4.527	0.850	0.162	1.840	5.333	0.837

Method	Waymo						NuScenes
	Avg.		Lidar-4		Random-0.1%		Avg.		Lidar-4		Random-0.1%
	Abs Rel ↓	RMSE ↓	Abs Rel ↓	RMSE ↓	Abs Rel ↓	RMSE ↓	Abs Rel ↓	RMSE ↓	Abs Rel ↓	RMSE ↓	Abs Rel ↓	RMSE ↓
Omni-DC	0.133	4.298	0.211	6.534	0.208	5.592	0.287	6.555	0.158	4.163	0.754	15.056
PromptDA	0.495	14.271	0.503	14.334	0.453	12.785	0.507	12.047	0.507	11.717	0.538	13.331
PriorDA	0.054	2.258	0.057	2.455	0.053	2.336	0.084	2.567	0.075	2.280	0.139	3.976
LB-Depth	0.126	4.433	0.069	3.544	0.314	7.780	0.173	5.280	0.119	4.937	0.327	6.560
Any2Full	0.090	2.850	0.117	2.484	0.325	6.719	0.542	6.946	0.080	2.754	1.927	19.469
Ours	0.053	1.946	0.054	1.989	0.049	1.843	0.087	2.005	0.083	1.893	0.103	2.343

Method	Alignment	Waymo		NuScenes		KITTI		DDAD		OpenScene		Time
Method	Alignment	Acc. ↓	Comp. ↓	Acc. ↓	Comp. ↓	Acc. ↓	Comp. ↓	Acc. ↓	Comp. ↓	Acc. ↓	Comp. ↓	Time
VGGT	Scale	0.554	0.290	0.603	0.346	0.182	0.258	0.410	0.272	0.246	0.219	~55.62s
Pi3	Scale	0.585	0.258	0.400	0.249	0.144	0.111	0.310	0.172	0.249	0.171	~35.08s
OmniVGGT	Scale	0.525	0.387	0.526	0.430	0.176	0.326	0.409	0.513	0.268	0.432	~45.59s
DVGT	Metric	0.756	0.481	1.078	0.808	0.522	0.743	0.371	0.705	0.306	0.381	~19.73s
DA3-G	Scale	0.333	0.216	0.360	0.275	0.155	0.128	0.252	0.212	0.206	0.222	~31.79s
DA3-G†	Metric	0.335	0.210	0.381	0.256	0.222	0.156	0.295	0.195	0.270	0.244	~31.79s
MapAny.	Metric	0.288	0.361	0.330	0.201	0.125	0.100	0.361	0.150	0.198	0.190	~34.49s
Ours	Metric	0.223	0.133	0.281	0.173	0.170	0.125	0.221	0.128	0.145	0.116	~25.12s

Qualitive Results

Interactive point-cloud reconstructions compare DA3, MoGe-2, and SurroundNEXO across diverse surround-view driving scenes, highlighting cross-view geometry and metric consistency.

Loading point cloud...

Click preset buttons below to jump to specific views.

BibTeX

@article{yuan2026surroundnexo,
  title   = {SurroundNEXO: Ego-Centric Metric Bridging for Spatially Consistent Geometry in Autonomous Driving},
  author  = author={Yuan, Shuai and Tang, Runxi and Ji, Yuzhou and Ge, Fudong and Wang, Hanshi and Wang, Yifei and Zeng, Xianming and Xu, Jianyun and Liu, Xingliang and Wang, Yanfeng and Zhang, Zhipeng},
  journal = {arXiv preprint arXiv:2606.16960},
  year    = {2026}
}