A call to understand a bit more about sharing data, metadata, linking things up and how it all plays together in today's Web to help answer tomorrow's challenges.
Publish data on the Web, what is Open Data, why it matter. (*) - part 1
- Why it matters
- Stop data hugging, go for Open Data !
How to make your data (re)useable by other (**, ***) - part 2
- Structure your data
- Use Open standards and formats
Make it better, ! (****, *****) - part 3
- URIs – sustainability
- Link it up - LOD
Publish and unleash your data, now !
Making data available to others is very likely going to make it more rich and powerful, create more value for you and others, than if it was forever sitting in your database (or files).
While some kind of data are definitely not intended to be publicly made available (such as customers details, or medical records and so on – generally speaking a lot of the so called « personal data »), in the other hand, a lot of other data benefit many communities by being published openly.
Releasing data have been claimed for already quite a long time in the fields of Science (1957-1958: creation of the World Data Center – aka World Data System, since 2009 [1]) and Governments or inter-governmental organizations [2] with more and more initiatives pushing towards what is now commonly know as Open Data.
Why you should release your data
- Access to knowledge. In the sense that some data simply cannot be licensed or patented, they belong to us all. Including various things such as genomes, environmental data, facts, and arguably publicly founded data should also be accessible).
- Transparency / accountability. For government, non-profits organizations as well as larger companies or even markets. Publishing data makes it possible for everyone to check for themselves the facts.
- Increase of values for all parties. For those who funds the dataset as well as those who uses it – it is often a win-win situation.
If we haven't convinced you that it is a good idea to publish your data have a look at : "The year open data went worldwide", a 5minutes talk by Tim Berners-Lee.
[1] https://www.icsu-wds.org/
[2] http://www.oecd.org/science/sci-
tech/oecdprinciplesandguidelinesforaccesstoresearchdatafrompublicfunding.htm
Put it on the Web, go for Open Data !
At this point, you are probably be considering: « Can I make some of the data I have publicly
available? » If the answer is yes, you should decide to publish them online, now!
It is a bit like printing out this information on paper to make it easier to distribute it more broadly.
So, what do you have to do in other to start?
Here is a small check list that may help you :
- What data do you have available?
- List it all, files, internal sites, from in-house software/system (or even some paper documents, really? :)
- Summarize how many of them do you have, what « extra » information (the so called metadata) do you have on them like authors, date of publication and how you can eventually export them (ideally using a standard format).
- Make sure you have the right to publish them, that the licensing terms are compatible and so on...
- Find an appropriate way and place to share them.
This step depends a lot on the resources, organization and context around you. Nonetheless you don't need much or to be a technical person to start making a difference.
As a first step, simply making them available online, whatever the form, it is already going to be useful.
Is it there yet ? Yes. Well good, but this is not enough. Now, you should consider entering the wonderful world of Open Data!
As explained before publishing more « openly » your data is surely going to make them more useful. It might even bring you some unexpected surprises, in ways this data is going to be used (like someone building a graph or a map out of your data) or even being looked at and analyzed that you wouldn't have necessarily thought of.
What is Open Data anyway ?
As in « Open Bar », people come and get what they want – they help themselves Open data « ...should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. » – Wikipedia https://en.wikipedia.org/wiki/Open_data
A rule of thumb for making your data open
- Publish it on the Web, give it a proper address
- Make it easy for people to access it and gather it, do not put barrier of any kinds.
Otherwise it is not really « Open » is it ? This must be a complete give away, no more hugging! - Make it really usable & re-usable.
Under an open license (such as PDDL, ODC-by or CC0). Use open standards and formats, avoid proprietary solutions and silos.
By publishing Open Data you allow others to access it, download it, transform it and re-share it with others.
Open definition website, a project by OKFN, gives a good, synthetic overview of what « Open » means in the Information technology context.
Make your data re-usable: Structure them and use open standards and open formats.
Now you have reach the 1st out of the 5 star Open Data track, but in order to make your data climb the ladder, you now need to make them truly accessible and usable.
Without going in too many technical details here is what probably matter the most.
First they need to be structured.
In short, that means you have think forward how they were going to be organized internally and you store them in a consistent way (this is commonly done using relational databases, spreadsheets as a place to store your data).
The goal here is to make them processable by machines in ways that can be reasonably easily described by a person (and eventually not a programmer).
Some important points
- Be consistent over time (do not change methodologies all the time, or at least provide some ways to still reach all the data – don't loose any !)
- Use understandable vocabularies, or even better commonly known or accepted ones.
Very simple self-speaking example, if you have a dataset about countries you should most likely be using 2 or 3 letters country codes that are widely accepted and used by tons of organizations. - Organize your data in a comprehensible way.
No need to over-engineer it, basic, common sense should be enough for most simple dataset.
For instance, if you are going to publish a dataset of indicators by countries every month, it probably make sense to organise it in tabular data, one dimension for the countries and one for the indicators and to publish a new file every month with the new data. A references that new publication along with the other (update an index of link, publish a news/announcements, it essentially depends on your framework/workflow).
So unless you are doing something pretty specialized in a specific field like theoretical sciences, advanced finances or complex statistical work, you shouldn't worry too much about it. It is always good and often enriching to ask others their opinions and returns of experiences and build upon it.
Designing a website, nice pages or APIs shouldn't be your priority first when it comes to releasing data. Give us the "Raw Data Now"! (a call by Tim Berners-Lee, Wired, Nov. 2012)
Use open standard & formats, allow others to re-use you work easily
In order to climb up the star ladder and get to the next 2nd and 3rd star you need to « consolidate » most of the above.
The 2 star will really be the step where people can re-use your data.
Using wide spread, known technologies is often a good starting point like putting your data in an excel or any tool that can hold them in a structured way. This will allow everyone to be able to open and process it but it is not necessarily free or available to all and sometimes it can be a silo (proprietary formats or simply a software that doesn't allow you to import/export data).
Naturally the 3rd star is about the use of open standard and formats.
This will dramatically lower the barriers for others to reuse the data, not only cost wise but also on the technology side. Using formats that are Open you have a lot of chances to find many different tools, libraries and softwares that are able to deal with them.
Allowing everyone to process your data without confining them to a specific solution is really a big turn on for developers, researchers, inventors, anyone with interests in what you publish. It also guaranty more transparency and control at every level. Those are a problematics the Open Source software community have been battling with and trying to explain for a long time [1].
An important part of building a good information system is to make it sustainable and coherent over time. On the Web this is done using URIs [2].
Carefully designing your Namespaces and be precise when you publish your data is surely a good thing to do. Some advanced use of complex URI scheme, such as for government linked data, can give a good idea of how an organization can break down the use of it's domain and sub domain name, in addition to various namespace design [3].
In the end, it doesn't have to be anything too fancy to start with, most simple web servers and systems easily allow you to host various types of resources where you want them to be (by choosing you own URI).
Ah! and remember that, as much as possible Cool URIs don't change !
[1] https://en.wikibooks.org/wiki/FOSS_Open_Standards/Importance_and_Benefits_of_Open_Standard
[2] https://en.wikipedia.org/wiki/Uniform_resource_identifier
[3] https://www.w3.org/2011/gld/wiki/223_Best_Practices_URI_Construction
Going further with Linked Open Data...
At the Land Portal we strongly advocate for this approach to build an even more powerful and useful network of data.
Today's technologies allow us to go a step further when we exchange and interconnect our data together. Without going into to many technical details, it allows us to create meaningful « links » between them [1].
So the Web now offers us the possibility to link concepts together and describe what relations tight them. This is what Linked Data is all about and to really be powerful it rely on those data to be « Open ».
Linked Open Data allows us to create powerful relations between data that really make them much more useful than if they were isolated.
Let pick a very trivial example to illustrate how one dataset can benefit LOD.
Organisation "A" publishes statistics on how many percent of the land is owned by the 1% richest people for 100 countries.
This organisation only publishes the results in English but use another organisation, let's call it "O", ontology (aka vocabulary, taxonomy, tags ...) which is richer and provides the list in many languages to link the two list of countries.
That is concretely speaking, organisation A dataset might look like (numbers are completely fake):
ANG : Angola : 24.345
BEL : Belgium : 2.71
CAN : Canada : 1.556
And so on...
Organisation B may be:
ANG : Angola : Angola : Africa ...
BEL : Belgium : Belgique : Europe ...
CAN : Canada : Canada : America
....
By "linking" their 2 datasets, it means they have re-used organisation "O" code for the Country (likely, hopefuly, the 3 letter ISO one)
This way, any application who already use organisation "O" countries list would be able very easily to also use organisation "A" dataset without many integration efforts (such as mapping one by one the countries) even if they use organisation "O" countries list in a different language or for the regions.
We believe that publishing Linked Open Data is really a step up toward more accessibility, sustainability, transparency and will bring many side benefit by being more re-useable, mixable, transformable and republished.
See also: http://linkeddata.org/
[1] https://en.wikipedia.org/wiki/Semantic_Web
What will you do to share that data now ?
Why wait anymuch longer? If we convinced you that it is a good thing to share data, you have to start today!
- Find out what data your organisation can share as Open Data
- Gather the data by any means, enrish it with basic meta data (who you are, who created the data, where and what is the original source, date it, etc...)
- Publish it online and create an index somewhere to allow others to access your data.
> Send us some suggestions / comments
> Participate in the Land portal data exchange program
> Read more about the Land Portal