英文摘要 |
The Hadoop distributed computing architecture provides solutions of Big Data processing for information systems, and development of virtualization technologies, constructing computing nodes based on virtual machine is the goal of industry for system integration. Although it reached high flexible deployment of network performance and hardware resources, the virtualized Hadoop clusters have become a major bottleneck because virtual machine is difficult to acquire high computing performance just like physical host. The administrator is also difficult to assess the impact of performance about virtualization of cloud platform. In this study, to deploy distributed computing clusters based on Hadoop scheme, integrate heterogeneous server computing resources, and using HBase as the main database storage system. We also adopt Hive, Sqoop and other kit to achieve heterogeneous environments data exchange. The Hadoop clusters will apply to physical host and virtual machine clusters. In order to efficiently assess the performance of cluster computing, this research integrate current performance model, and compared execution performance value with predicted performance value to verify the availability of this model. All results show that modified model is applicable to this system architecture. Then, we consider the impact factors of virtualization, putting performance model into virtualized Hadoop cluster, and verify differences in performance between the physical host and virtual host. We propose the improved solution for virtualization technology for cloud platform and make Hadoop in a virtual host can get the same performance just like in a physical host. |