Flying modular robots have the potential to rapidly form temporary structures. In the literature, docking actions rely on external systems and indoor infrastructures for relative pose estimation. In contrast to related work, we provide local estimation during the self-assembly process to avoid dependency on external systems. In this paper, we introduce ModQuad-Vi, a flying modular robot that is aimed to operate in outdoor environments. We propose a new robot design and vision-based docking method. Our design is based on a quadrotor platform with onboard computation and visual perception. Our control method is able to accurately align modules for docking actions. Additionally, we present the dynamics and a geometric controller for the aerial modular system. Experiments validate the vision-based docking method with successful results.