This research proposes a method that integrates computer vision and human-robot collaboration to assist industrial robots in autonomous on-site object recognition and assembly, bridging the gap from model information to physical construction tasks. It explores the possibilities and practical applications of robotic construction technology at the construction site by integrating interdisciplinary resources and techniques. The proposed approach combines artificial intelligence, computational techniques, robotic assembly methods, Building Information Modeling (BIM), and visual recognition systems into an integrated workflow. The paper developed a BIM-assisted Autonomous Robotic Recognition and Pose Estimation System (ARROS), which converts digital design data and object characteristics into actionable instructions at the construction site. Utilizing real-time visual guidance and pose estimation techniques, ARROS enables robotic manipulators to handle construction elements autonomously, irrespective of variations in material, shape, color, or environmental conditions. To facilitate real-time decision-making based on dynamic site conditions, this study presents a low-cost, open-source, visually guided closed-loop control system. The recognition process includes three primary stages: (1) BIM-assisted virtual object scanning, (2) object depth recognition, and (3) on-site object grasp pose estimation. A depth camera is mounted in an eye-in-hand configuration, and an interactive human-robot collaboration interface provides real-time feedback to users, allowing immediate corrections or manual interventions, thereby enhancing the system’s overall flexibility and usability. Further testing and discussions of this visual recognition method were conducted across various environmental and physical conditions on construction sites, particularly emphasizing applicability in semi-structured scenarios. Ultimately, the system and methods developed in this research aim to advance robotic automation for on-site assembly, exploring novel applications within the context of intelligent construction.