英文摘要 |
"Big data" has become a popular term in clinical and public health research in recent years. As there is no authoritative definition about "big data," different authors and institutions have defined its scope and content differently, with "large amount of data," "variety of data sources," and "unstructured data" as commonly cited attributes. In this article the authors focused on the most important ingredient of big data concept – utilizing data from disparate data sources and taking advantage of advance in information technology to draw valid inferences that are useful to medical and public health practice. While big data may hold promising potential to generate and test hypothesis with massive amount of data, the fundamental principles of making inference with observational data still apply. From this perspective the discipline of epidemiology is the foundation of current "big data" research. Research with large amount of data may substantially reduce random error, but systematic errors cannot be addressed by the size of the datasets and need to be reduced through sound research methods and robust analysis. Without appropriate study design and analysis, large datasets may most likely yield precise but wrong answers. Some examples in published literature are utilized in the article to illustrate the above principles. Lastly, privacy and confidentiality considerations for research utilizing health data are discussed. |