Skip to main content

page search

Linked Open Data Glossary

This glossary lists terms related to publishing and consuming either Linked Data in the enterprise or Linked Open Data on the public Web. The source of this glossary is, mainly, the Linked Data Glossary from W3C and the Glossary from Data on the Web Best Practices (also W3C).

 

5 Star Linked Open Data

5 Star Linked Open Data refers to an incremental framework for deploying data. Tim Berners-Lee, the inventor of the Web and initiator of the Linked Data project, suggested a 5 star deployment scheme for Linked Open Data. The 5 Star Linked Data system is cumulative. Each additional star presumes the data meets the criteria of the previous step(s). 5 Star Linked Open Data includes an Open License (expression of rights) and assumes publications on the public Web.

AGROVOC
It is a controlled vocabulary covering all areas of interest of the Food and Agriculture Organization (FAO) of the United Nations, including food, nutrition, agriculture, fisheries, forestry, environment etc. It is published by FAO and edited by a community of experts.
API
An Application Programming Interface (API) is an abstraction implemented in software that defines how others should make use of a software package such as a library or other reusable program. APIs are used to provide developers access to data and functionality from a given system.
Controlled Vocabulary
Carefully selected sets of terms that are used to describe units of information; used to create taxonomies, thesauri and ontologies. In traditional settings the terms in the controlled vocabularies are words or phrases, in a linked data setting then they are normally assigned unique identifiers (URIs) which in turn link to descriptive phrases.
Comma Separated Values (CSV)
A tabular data format in which columns of information are separated by comma characters. CSV files are a non-proprietary format and are considered 3-star data on the 5-star scale.
Creative Commons Licenses
Licenses that include legal statements by the owner of copyright in intellectual property specifically allowing people to use or redistribute the copyrighted work in accordance with conditions specified therein. See also About Creative Commons Licenses.
Dataset
A dataset is defined as a collection of data, published or curated by a single agent, and available for access or download in one or more formats.
Free/Libre/Open Source Software
Free, also known as Libre or Open Source, is a generic and internationalized term for software released under an Open Source license.
License
A license is a legal document giving official permission to do something with the data with which it is associated.
Linked Data
A pattern for hyperlinking machine-readable data sets to each other using Semantic Web techniques, especially via the use of RDF and URIs. Enables distributed SPARQL queries of the data sets and a browsing or discovery approach to finding information (as compared to a search strategy). Linked Data is intended for access by both humans and machines. Linked Data uses the RDF family of standards for data interchange (e.g., RDF/XML, RDFa, Turtle) and query (SPARQL). If Linked Data is published on the public Web, it is generally called Linked Open Data
Linked Open Data
Linked Data published on the public Web and licensed under one of several open licenses permitting reuse. Publishing Linked Open Data enables distributed SPARQL queries of the data sets and a "browsing" or "discovery" approach to finding information, as compared to a search strategy. 
Machine Readable Data

Data formats that may be readily parsed by computer programs without access to proprietary libraries. For example, CSV, TSV and RDF formats are machine readable, but PDF and Microsoft Excel are not. Creating and publishing data following Linked Data principles helps search engines and humans to find, access and re-use data. Once information is found, computer programs can re-use data without the need for custom scripts to manipulate the content.

Publishing machine readable data using Linked Data principles provides a human and machine readable version. For example, Wikipedia includes a Web page about the color Red. DBpedia, the database containing structured content contained in Wikipedia, allows a Linked Data client to look up "Red" [https://wikipedia.org/wiki/Red] by changing "wiki" to "data" and appending the appropriate file extension.

$ curl -L http://dbpedia.org/data/Red.ttl
Metadata
Information used to administer, describe, preserve, present, use or link other information held in resources, especially knowledge resources, be they physical or virtual. Metadata may be further subcategorized into several types (including general, access and structural metadata). Linked Data incorporates human and machine readable metadata along with it, making it self describing.
Ontology
A formal model that allows knowledge to be represented for a specific domain. An ontology describes the types of things that exist (classes), the relationships between them (properties) and the logical ways those classes and properties can be used together (axioms).
Resource Description Framework (RDF)
A family of international standards for data interchange on the Web produced by W3C. Resource Description Framework (RDF) is based on the idea of identifying things using Web identifiers or HTTP URIs, and describing resources in terms of simple properties and property values.
RDF database
A type of database designed specifically to store and retrieve RDF information. May be implemented as a triple store, quad store or other type.
Resource
In an RDF context, a resource can be anything that an RDF graph describes. A resource can be addressed by a Unified Resource Identifier (URI).
Semantic Web
An evolution or part of the World Wide Web that consists of machine-readable data in RDF and an ability to query that information in standard ways (e.g. via SPARQL)
SPARQL
SPARQL Protocol and RDF Query Language (SPARQL) defines a query language for RDF data, analogous to the Structured Query Language (SQL) for relational databases. A family of standards of the World Wide Web Consortium.
SPARQL endpoint
A service that accepts SPARQL queries and returns answers to them as SPARQL result sets. It is a best practice for datasets providers to give the URL of their SPARQL endpoint to allow access to their data programmatically or through a Web interface. A list of some endpoints status is available at https://mondeca.com/sparqlEndpointsStatus/
Subject
The subject is the first part of an RDF statement. A subject in the context of a triple <?s ?p ?o> refers to who or what the RDF statement is about.
Standard
A technical standard is an established norm or requirement in regard to technical systems. It is usually a formal document that establishes uniform engineering or technical criteria, methods, processes and practices. In contrast, a custom, convention, company product, corporate standard, etc. that becomes generally accepted and dominant is often called a de facto standard.
Taxonomy
Taxonomy is a formal representation of relationships between items in a hierarchical structure. 
Term
An entry in a controlled vocabulary, schema, Taxonomy or Ontology.
Thesaurus
A thesaurus, as used in information science and literature retrieval, is essentially a controlled vocabulary following a standard structure, where all terms in the thesaurus have relationships to each other. These relationships are typically of three kinds: hierarchical (broader term/narrower term), associative (see also), and equivalent (use/used from or see/seen from). In addition, it is common in thesauri that some or all terms have scope notes, brief explanations of how the term should be used in indexing. Term history notes may also be present.
Thesauri are most often used in indexing periodical literature, especially over a period of time. Detailed thesauri have been created for specialized subject areas by publishers of periodical indexes. Often the thesauri themselves have become published works for purchase. (from Taxonomies & Controlled Vocabularies SIG)
Triple
An RDF statement, consisting of two things (a "subject" and an "object") and a relationship between them (a verb, or "predicate"). This subject-predicate-object triple forms the smallest possible RDF graph (although most RDF graphs consist of many such statements).
Triple store
A colloquial phrase for an RDF database that stores RDF triples.
Tuple
RDF statements are 3-tuples; an ordered list of three elements.
Uniform Resource Identifier (URI)
A global identifier standardized by joint action of the World Wide Web Consortium and Internet Engineering Task Force. A Uniform Resource Identifier (URI) may or may not be resolvable on the Web. URIs play a key role in enabling Linked Data. URIs can be used to uniquely identify virtually anything including a physical building or more abstract concepts such as colors. 

URIs have been known by many names: Web addresses, Universal Document Identifiers, Universal Resource Identifiers.

Uniform Resource Locator (URL)
A global identifier for Web resources standardized by joint action of the World Wide Web Consortium and Internet Engineering Task Force. A URL is resolvable on the Web and is commonly called a "Web address". All HTTP URLs are URIs however, not all URIs are URLs.
Vocabulary
A collection of "terms" for a particular purpose. Vocabularies can range from simple such as the widely used RDF Schema, FOAF and Dublin Core Metadata Element Set to complex vocabularies with thousands of terms, such as those used in healthcare to describe symptoms, diseases and treatments. Vocabularies play a very important role in Linked Data, specifically to help with data integration. The use of this term overlaps with Ontology.