Jetank: Perception-Guided Toy Picking

Demonstration of perception-driven target selection and pick execution on resource-limited embedded hardware.

Embedded constraints Closed-loop behavior Robustness-first

The video shows Jetank detection and pick cycle: detect → approach → pick.

Motivation

The motivation for this project was primarily educational. The Jetank is a commercially available mobile robot equipped with a four-degree-of-freedom manipulator, a monocular RGB camera, and a Jetson Nano 4GB running the JetBot software stack.

Due to the limited computational resources and memory available on the Jetson platform, the project focused on evaluating perception-driven manipulation under strict hardware constraints. A lightweight vision model was used for object classification, without fine-tuning for object detection.

The primary goal was to observe the trade-offs between detection accuracy, classification reliability, and inference latency on resource-limited embedded hardware, and to assess the feasibility of closed-loop pick execution under these constraints.

System overview

The solution is structured as a modular perception–planning–execution pipeline deployed on embedded hardware. Perception and motion control are decoupled into separate runtime components, enabling independent development and evaluation of visual detection and robot behavior.

The perception module provides object localization results that are consumed by a planning layer responsible for filtering irrelevant scene elements and selecting a suitable target object. Target selection is based on geometric properties derived from visual observations and is designed to remain robust under limited computational resources.

Motion commands are generated from the estimated target pose and executed in a closed-loop manner, continuously updating perception and planning until alignment criteria are satisfied. Once the robot is sufficiently aligned with the target, a predefined pick routine is executed.

The overall system follows an iterative sense–plan–act cycle optimized for resource-constrained platforms, prioritizing robustness and predictable behavior over model complexity.

Key challenges addressed

The primary challenges of the project were the limited computational and power resources of the embedded platform, sensitivity to lighting conditions and partial occlusions, and achieving reliable pick execution without precise environment calibration. A significant engineering challenge was reducing the overall cycle time, which initially exceeded one minute due to repeated initialization overhead in the perception pipeline. Optimizing the runtime structure enabled the system to reach a cycle time of approximately 1.5 seconds, making closed-loop operation feasible on the target hardware.