英文摘要 |
In order to increase the reuse value of existing datasets, it is now becoming a general practice to add semantic links among the records in a dataset, and to link these records to external resources. The enriched datasets are published on the web for both human and machine to consume and re-purpose. In this paper, we make use of publicly available structured records from a digital archive catalogue, and we demonstrate a principled approach to converting the records into semantically rich and interlinked resources for all to reuse. While exploring the various issues involved in the process of reusing and re-purposing existing datasets, we review the recent progress in the field of Linked Open Data (LOD), and examine twelve well-known knowledge bases built with a Linked Data approach. We also discuss the general issues of data quality, metadata vocabularies, and data provenance. The concrete outcome of this research work is the following: (1) a website data.odw.tw that hosts more than 840,000 semantically enriched catalogue records across multiple subject areas, (2) a lightweight ontology voc4odw for describing data reuse and provenance, among others, and (3) a set of open source software tools available to all to perform the kind of data conversion and enrichment we did in this research. We have used and extended CKAN (The Comprehensive Knowledge Archive Network) as a platform to host and publish Linked Data. Our extensions to CKAN is open sourced as well. As the records we drawn from the originally catalogue are released under the Creative Commons licenses, the semantically enriched resources we now re-publish on the Web are free for all to reuse as well. |