Silk

The Linked Data Integration Framework

Robert Isele (eccenca GmbH)
Anja Jentzsch (Hasso Plattner Institut)
Christian Bizer (University of Mannheim)
Julius Volz (Google)
Petar Petrovski (University of Mannheim)

Silk is an open source framework for integrating heterogeneous data sources. The primary uses cases of Silk include: Silk is based on the Linked Data paradigm, which is built on two simple ideas: First, RDF provides an expressive data model for representing structured information. Second, RDF links are set between entities in different data sources. Background information about Linked Data and the vision of the Web of Data can be found in the overview article Linked Data - The Story So Far and the Linked Data book.

Linking Data Sources

Using the declarative Silk - Link Specification Language (Silk-LSL), developers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must fulfill in order to be interlinked. These link conditions may combine various similarity metrics and can take the graph around a data item into account, which is addressed using an RDF path language. Silk accesses the data sources that should be interlinked via the SPARQL protocol and can thus be used against local as well as remote SPARQL endpoints. Link Specifications can be created using the Silk Workbench graphical user interface or manually in XML.

The linking process is based on the Silk Link Discovery Engine which offers the following features:

Data Transformations

While the main part of a integration workflow lies in the interlinking of data sources. Data sets coming fron different sources sometimes required the harmonization of the schemata and data formats prior to interlinking. For this purpose, Silk enables the user to create and execute lightweight transformation rules. Transformation rules may be used for:

Silk Workbench

Silk Workbench is a web application which guides the user through the process of interlinking different data sources.

Silk Workbench offers the following features:

Documentation of the Silk Workbench is available in the Wiki.

Silk Command Line Applications

In addition to the Workbench, Silk provides three different command line applications for executing link specifications:

Silk Free Text Preprocessor

The main goal of the Free Text Pre-processing tool is to produce a structured representation of data that contains or is derived from free text. The tool takes as input an RDF file with properties with free text values and an additional RDF file that contains structured data used to learn the extraction model. Based on the learned model the tool extracts new property-value pairs from free text. The resulting output is an RDF dump file containing the extracted structured values. Using a declarative XML-based language, a user can specify which extraction methods to use.

Documentation of the Silk Free Text Preprocessor is available in the Wiki.

Acknowledgments

This work was supported in part by Vulcan Inc. as part of its Project Halo and by the EU FP7 project LOD2 - Creating Knowledge out of Interlinked Data (Grant No. 257943).