BIS PhD Seminar

Processing and Evaluating Large Amounts of Open Data

Created by Ivan Ermilov 3rd March 2014

Who Am I? What is hidden behind the topic name?

Retrospective

Last PhD Seminar - 3rd December 2012
AKSW Colloquium
I am a part of AKSW/BIS group - since September 2011
What's the progress so far?

My Progress

Projects
Publications
Other Activities

Projects

SlideWiki
CubeViz
LODStats (current)
CSV2RDF Mass Conversion Framework (current)

Publications

SlideWiki: Elicitation and Sharing of Corporate Knowledge using Presentations by Ali Khalili, Sören Auer, Darya Tarasowa, and Ivan Ermilov in Proceedings of the EKAW 2012
Towards an Open-Governmental Data Web by Ivan Ermilov, Claus Stadler, Michael Martin, and Sören Auer in Proceedings of the KESW2012
Linked Open Data Statistics: Collection and Exploitation by Ivan Ermilov, Michael Martin, Jens Lehmann, and Sören Auer in Proceedings of the 4th Conference on Knowledge Engineering and Semantic Web
CSV2RDF: User-Driven CSV to RDF Mass Conversion Framework by Ivan Ermilov, Sören Auer, and Claus Stadler in Proceedings of the ISEM '13, September 04 - 06 2013, Graz, Austria

Other Activities

Participated in the Semantic Web Challenge 2013 with the CSV2RDF project.

Planned and pending publications

A Fine-Grained Evaluation of SPARQL Endpoint Federation Systems by Muhammad Saleem, Yasar Khan, Ali Hasnain, Ivan Ermilov, Axel-Cyrille Ngonga Ngomo (submitted to Semantic Web Journal)
CubeViz journal publication (Michael Martin)
LODStats journal publication

Work in progress

LODStats project: a statement-stream-based approach for gathering comprehensive statistics about RDF datasets.

CSV2RDF project: user-driven semantic mapping of tabular data.

LODStats

Evaluate RDF datasets triple by triple
Gathers 32 stat. criteria such as:
- Number of triples, entities, literals
- Average string length
- Vocabularties, classes used
Helps to understand the data
Generates VoID desctiptions

LODStats Architecture

LODStats Features

Report of warnings and errors for datasets
Report on statistical criteria for datasets
Export as VoID/DataCube metadata
General LOD cloud statistics
Search function fro datasets, vocabularies, classes, properties, languages, datatypes
REST interface for the search functions
Linked Data publication of statistics
SPARQL endpoint
CubeViz installation for the statistics visualization

LODStats Hidden Features

Which visualization is suitable for my dataset?
- RDF DataCube Vocabulary/Ontology used?
- Contains geospatial information?
- What is the class hierarchy depth?
Does dataset fit into the specific application?
- Linked to DBPedia?
- Largest known dataset in the domain?

LODStats: Future Steps

Parallel processing
- Inside the dataset
- Distributing datasets across the cluster
LinDA project (visualization framework)

CSV2RDF Mass Conversion Framework

377 data catalogs are available with a large amount of tabular data. How to deal with that data? (Only 3% are available as RDF)

BIS PhD Seminar – Processing and Evaluating Large Amounts of Open Data – My Progress

earthquakesan

BIS PhD Seminar – Processing and Evaluating Large Amounts of Open Data – My Progress

0 0

PhDSeminar-03032014

BIS PhD Seminar

Processing and Evaluating Large Amounts of Open Data

Retrospective

My Progress

Projects

Publications

Other Activities

Planned and pending publications

Work in progress

LODStats

LODStats Architecture

LODStats Features

LODStats Hidden Features

LODStats: Future Steps

CSV2RDF Mass Conversion Framework

Results so far

Architecture

Generated Mappings

User Interface

Open Questions

Questions?

Presented BY Ivan Ermilov, BIS/AKSW

BIS PhD Seminar – Processing and Evaluating Large Amounts of Open Data – My Progress

earthquakesan

BIS PhD Seminar – Processing and Evaluating Large Amounts of Open Data – My Progress

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

PhDSeminar-03032014

BIS PhD Seminar

Processing and Evaluating Large Amounts of Open Data

Retrospective

My Progress

Projects

Publications

Other Activities

Planned and pending publications

Work in progress

LODStats

LODStats Architecture

LODStats Features

LODStats Hidden Features

LODStats: Future Steps

CSV2RDF Mass Conversion Framework

Results so far

Architecture

Generated Mappings

User Interface

Open Questions

Questions?

Presented BY Ivan Ermilov, BIS/AKSW

0 0