What is Semantic Web
- Humans can understand web content
- Machines (computers) need some help
- In a semantic environment Machines can understand the meaning of a thing
(document, port, text, system status etc) and link to/from it.
- Metadata are a type of semantics
What do Machines read
Word word link word word word
word link
word link
word link.
What is Linked Data
- Really connected structured web
- Bizer, Heath and Berners-Lee (2009)
- Internet of Things not only of documents
- Linked Data may not be Open Data
- Automatic decisions
- Find information faster, easier and with accuracy
- The basic 'engine' behind Big Data
- The key for Open Government/Cities/Health/...
The 5th star of the Open Data
Tech behind linked data
- URI (Uniform Resource Identifier)
- RDF (Resource Description Framework)
- http
- Vocabularies - Ontologies
- SPARQL
The 4 principles of Linked Data
Tim Berners-Lee, 2006
- Use URIs to name (identify) things
- Use HTTP URIs so that these things can be looked up (interpreted, dereferenced)
- Provide useful information about what a name identifies when it's looked up, using open standards such as RDF, SPARQL etc
-
Refer to other things using their HTTP URI-based names when publishing data on the Web
RDF
- A graph data model for describing things
- Describe the relationships between things
- W3C specification (1997)
- Only describes resources
- It's a data model, a concept not a data format
- It needs serialization
- Current version RDF 1.1 (2014) after 1.0 (2004)
The RDF triples
- [subject] [predicate] [object]
- "John knows Mary"
- We must state explicitly the nature of the connection
- We can download or query triples from triplestores
- Types of RDF triples: Literal Triples and RDF Links
RDF - URI
Absolute IRI which may include a # fragment.
<http://www.example.org/>
<http://www.example.org/#fragment>
Relative IRI resolved against base IRI.
<abc.rdf>
Base IRI, usually the query document IRI
<>
IRI shorthand using XML-style prefix ex and local name.
Declared with PREFIX (SPARQL) or @prefix (Turtle)
ex:name
RDF - Literal
A Unicode string with an optional language tag.
"hello"
"bonjour"@fr
"1234"
RDF - Typed Literal
Literals with an XML schema datatype
A Unicode string and datatype IRI for encoding datatypes.
"1234"^^<https://www.w3.org/2001/XMLSchema#string>
Abbreviated with an XML QName style as:
"1234"^^xsd:string
Short forms for several common datatypes:
-10
"-10"^^xsd:integer
1.2345
"1.2345"^^xsd:decimal
true
"true"^^xsd:boolean
Taxonomies, Vocabularies and Ontologies
- Domain-specific terms for describing classes (groups) of things and how they relate to each other
- Lightweight ontologies in RDF often referred as vocabularies
- They borrow classes/properties from each other
- We can create our own, extend or reuse ontologies
- Popular: Schema.org,
SKOS,
FoaF,
DCMI,
SIOC etc
- Overview of Linked Open Vocabularies at LOV
RDF - Namespaces and Prefixes
Namespace
Prefix
Namespace URI
RDF
rdf:
http://www.w3.org/1999/02/22-rdf-syntax-ns#
Dublin Core
dc:
http://purl.org/dc/elements/1.1/
FOAF
foaf:
http://xmlns.com/foaf/0.1/
XML Schema Datatypes
xsd:
http://www.w3.org/2001/XMLSchema#
RDFS
rdfs:
http://www.w3.org/2000/01/rdf-schema#
OWL
owl:
http://www.w3.org/2002/07/owl#
Find prefixes at prefix.cc
The RDF Graph
- A collections of statements about a thing
- Starting with the same Subject
- The URIs occurring as subject and object are the nodes in the graph
- A real example
of a webpage
SPARQL - Reference synopsis
Patterns
Modifiers
Query Forms
RDF terms
DISTINCT
SELECT
triple patterns
REDUCED
CONSTRUCT
Basic graph patterns
PROJECT
DESCRIBE
Groups
ORDER BY
ASK
OPTIONAL
LIMIT
UNION
OFFSET
GRAPH
FILTER
SPARQL - Common Syntax
# prefix declarations
PREFIX foo: <http://example.com/resources/>
...
# dataset definition
FROM ...
# result clause
SELECT ... ?variables
# query pattern
WHERE {
... ?variables
}
# query modifiers
ORDER BY ...
LIMIT n OFFSET m
SPARQL - Tips
- Same results can be achived with different queries
- More accurate queries are faster
- Prefer Group Graph patterns instead of FILTER
- Avoid SELECT *
- Avoid ORDER BY
- Avoid DISTINCT (use REDUCED)
- Use OFFSET, LIMIT to paginate results
- Variables are case insensitive (?var is the same as ?VAR)
- Variables cannot change name on the same query
- Variables can start with ? or $
The problems of Linked Data
- Confusion of standarization (see w3c)
- Hard for developers to understand
- Which schema and ontology to use
- Need for powerful/special software
- Missing metadata for the datasets
- Cannot really rely on endpoints (see SPARQL endpoint status from Datahub.io linked datasets)
- Quality and accuracy of the data
wwwL(inked)
Linked data, RDF, SPARQL and the future
Presentation by TheodorosPloumis / @theoploumis
Meetup No 29 - 16 March 2016 - TechMinistry.
Under Attribution 4.0 International license.