ArXive Preprint
Sim-to-Real Fruit Detection Using Synthetic Data: Quantitative Evaluation and Embedded Deployment with Isaac Sim
Synthetic data generated in NVIDIA Isaac Sim (left), in-domain real image of plastic fruit (right).
Motivation
The motivation for this paper was to evaluate the performance of datasets generated with synthetic data. In industry, generating huge datasets is time consuming and not optimal due to production stops generated by such activities. If synthetic data could substitute (at least partially) the dataset, training and deployment of detection models would be efficient.
In addition, selected model is deployed on edge device, which is typically limited on computation power and trade-off between latency and detection precision can be challenging.
The primary goal was to observe how synthetic datasets, real datasets and hybrid ones influence on test sets (one test set in-domain and second test set with domain-shift).
Abstract
This study investigates the effectiveness of synthetic data for sim-to-real transfer in object detection under constrained data conditions and embedded deployment requirements. Synthetic datasets were generated in NVIDIA Isaac Sim and combined with limited real-world fruit images to train YOLO-based detection models under real-only, synthetic-only, and hybrid regimes. Performance was evaluated on two test datasets: an in-domain dataset with conditions matching the training data and a domain shift dataset containing real fruit and different background conditions. Results show that models trained exclusively on real data achieve the highest accuracy, while synthetic-only models exhibit reduced performance due to a domain gap. Hybrid training strategies significantly improve performance compared to synthetic-only approaches and achieve results close to real-only training while reducing the need for manual annotation. Under domain shift conditions, all models show performance degradation, with hybrid models providing improved robustness. The trained models were successfully deployed on a Jetson Orin NX using TensorRT optimization, achieving real-time inference performance. The findings highlight that synthetic data is most effective when used in combination with real data and that deployment constraints must be considered alongside detection accuracy.
Key challenges addressed
The primary challenges of this work are related to the effective use of synthetic data for sim-to-real transfer in perception systems. A key difficulty lies in the domain gap between simulated and real-world data, which can significantly impact model performance when deployed outside of controlled conditions.
In addition, the work addresses the challenge of balancing dataset composition, evaluating how synthetic, real, and hybrid datasets influence detection accuracy and robustness. Special attention is given to generalization under domain shift, where environmental conditions differ from the training distribution.
Finally, deployment on resource-constrained embedded hardware introduces trade-offs between inference latency and detection performance, requiring careful optimization to achieve real-time operation while maintaining acceptable accuracy.
- Domain gap between synthetic and real-world data
- Dataset composition and hybrid training strategies
- Generalization under domain shift conditions
- Trade-off between accuracy and real-time inference on embedded hardware
Results / Findings
The evaluation confirms that dataset composition has a significant impact on detection performance in sim-to-real scenarios. Models trained exclusively on real data achieve the highest accuracy under in-domain conditions, while synthetic-only models exhibit reduced performance due to the domain gap between simulated and real-world data.
Hybrid training approaches, combining synthetic and limited real data, provide a strong balance between performance and data efficiency. These models achieve results close to real-only training while reducing the need for extensive manual annotation.
Under domain shift conditions, all models experience performance degradation; however, hybrid models demonstrate improved robustness compared to synthetic-only approaches. This indicates that synthetic data is most effective when used to complement, rather than replace, real-world datasets.
Deployment on embedded hardware shows that real-time inference is achievable, but requires careful optimization to balance latency and detection precision. The results highlight the importance of considering deployment constraints alongside model accuracy during system design.
Paper
Full paper available on arXiv.