Optimizing real-time stereo image retargeting for AR/VR: Lightweight disparity CNNs on AI-Driven edge architectures

Research Article

Optimizing real-time stereo image retargeting for AR/VR: Lightweight disparity CNNs on AI-Driven edge architectures

DOI: 10.1080/20421338.2025.2601663
Author(s): Mahendra T. Jagtap University of South Florida, Muma College of Business (Sarasota-Manatee Campus), USA , Bhuvan Unhelkar University of South Florida, USA , Pravin R. Kshirsagar JD College of Engineering and Management, India , Nitin Rakesh Symbiosis Institute of Technology, India , R. Thiagarajan VelTech multiTech Dr. Rangarajan Dr. Sakunthala Engineering College, India , Vishal Patil Loknete Gopinathji Munde Institute of Engineering Education & Research, Nashik and SPPU, India

Abstract

Real-time stereo image retargeting for augmented reality (AR) and virtual reality (VR) necessitates precise per-pixel depth estimation and ultra-low latency performance on resource-limited edge devices. Current disparity convolutional neural networks (CNNs) and retargeting pipelines are unable to meet these demanding requirements concurrently. This paper presents EASNet, a compact end-to-end framework that unifies geometry-aware proposal generation, parallax-aligned feature encoding, sparse candidate aggregation, and uncertainty-guided refinement to enable high-fidelity stereo retargeting on edge architectures. This system enhances stereo vision through Epipolar-Adaptive Disparity Proposals (EADP) for search space reduction, a Parallax-Directed Deformable Encoder (PaDDE) for improved matching in repetitive and low-texture areas, Sparse Epipolar Candidate Volume (SECV) with Edge-Consistent Routing (ECR) for efficient, boundary-preserving cost aggregation, and Lightweight Uncertainty-Guided Refinement (LUGR) for sub-pixel structure and occlusion correction. Evaluated on high-resolution indoor stereo data, EASNet attains a favourable trade-off between accuracy and efficiency (≈0.21 M parameters, 3.42 GFLOPs) while improving disparity fidelity and visual coherence required for retargeting (reported EPE ≈ 1.78 px, D1 ≈ 5.02%, VC ≈ 93.7%). The design emphasizes quantization compatibility and deterministic latency, enabling practical deployment on AR/VR edge devices. We analyze ablations, per-scene behaviour, and k-fold stability, discussing limitations like indoor bias, extreme occlusion, large baselines, and future EASNet extensions.

Get new issue alerts for African Journal of Science, Technology, Innovation and Development