Miniaturized grippers that possess an untethered structure are suitable for a wide range of tasks, ranging from micromanipulation and microassembly to minimally invasive surgical interventions. In order to robustly perform such tasks, it is critical to properly estimate their overall configuration. Previous studies on tracking and control of miniaturized agents estimated mainly their 2D pixel position, mostly using cameras and optical images as a feedback modality. This paper presents a novel solution to the problem of estimating and tracking the 3D position, orientation and configuration of the tips of submillimeter grippers from marker-less visual observations. We consider this as an optimization problem, which is solved using a variant of the Particle Swarm Optimization algorithm. The proposed approach has been implemented in a Graphics Processing Unit (GPU) which allows a user to track the submillimeter agents online. The proposed approach has been evaluated on several image sequences obtained from a camera and on B-mode ultrasound images obtained from an ultrasound probe. The sequences show the grippers moving, rotating, opening/closing and grasping biological material. Qualitative results obtained using both hydrogel (soft) and metallic (hard) grippers with different shapes and sizes ranging from 750 microns to 4 mm (tip to tip), demonstrate the capability of the proposed method to track the agent in all the video sequences. Quantitative results obtained by processing synthetic data reveal a tracking position error of 25 ±7μm and orientation error of 1.7 ± 1.3 degrees. We believe that the proposed technique can be applied to different stimuli responsive miniaturized agents, allowing the user to estimate the full configuration of complex agents from visual marker-less observations.