On Github raymanrt / graphdbs4jug
Riccardo Tasso (@riccardotasso)
JUG Trento - 03/11/2015
You'll find sources and examples of this slides on github
If implicit schema are such a problem
(2013, Martin Fowler)
the capability of adding a new node to a (distributed) system to improve its performances
it is impossible for a distributed computer system to simultaneously provide all three of the following guarantee:
graph theory started in the 18th century
compiler optimizations
computer networks (Internet)
the WWW (hypertexts)
pipe network analysis
circuit theory
language models
protein interaction network
social networks
Any storage system that provides index-free adjacency
(Marko Rodriguez, Peter Neubauer, 2010)
let's try selecting pizzas liked by Nicola!
wow, it's easiest to sketch!
With a GraphDB
Unleash the power of TRAVERSAL!
At least with SQL I know how to write it:
SELECT pizza.name FROM Person as person JOIN Likes as likes ON person.id = likes.person JOIN Pizza as pizza ON likes.pizza = pizza.id AND JOIN Contains as contains ON pizzza.id = contains.pizza JOIN Ingredient as ingredient ON contains.ingredient = ingredient.id JOIN TypicalOf as typicalOf ON ingredient.id = typicalOf.ingredient JOIN Region as region ON typicalOf.region = region.id WHERE region.name = 'taas'
but where are Nicola's friends? :(
@Test public void myFirstGraphTest() { Graph graph = new TinkerGraph(); Vertex nicola = graph.addVertex("nicola"); nicola.setProperty("biography", "A long time ago in galaxy far far away..."); nicola.setProperty("experience", 100); Vertex cristian = graph.addVertex("cristian"); Vertex riccardo = graph.addVertex("riccardo"); Edge nicolaFriendOfCristian = graph .addEdge(null, nicola, cristian, "friendOf"); Edge nicolaFriendOfRiccardo = nicola .addEdge("friendOf", riccardo); System.out.println(DateTime.parse("2015-02-18")); nicolaFriendOfCristian.setProperty("since", DateTime.parse("2015-02-18")); nicolaFriendOfRiccardo.setProperty("since", DateTime.parse("2014-02-26")); assertEquals(3, size(graph.getVertices())); assertEquals(2, size(graph.getEdges())); graph.shutdown(); }
@Test public void complexGraphTest() { Graph graph = new TinkerGraph(); Vertex nicola = graph.addVertex("nicola"); nicola.setProperty("presentations", Arrays.asList("mvn", "java8")); Map<String, Object> technologies = new HashMap<>(); technologies.put("java", 20); technologies.put("sql", 15); nicola.setProperty("technologies", technologies); assertEquals(2, size(nicola.getPropertyKeys())); graph.shutdown(); }
@Test public void iterateOverMyFirstGraphTest() { Graph graph = new TinkerGraph(); Vertex nicola = graph.addVertex("nicola"); Vertex cristian = graph.addVertex("cristian"); Vertex riccardo = graph.addVertex("riccardo"); Edge nicolaFriendOfCristian = graph .addEdge(null, nicola, cristian, "friendOf"); Edge nicolaFriendOfRiccardo = nicola .addEdge("friendOf", riccardo); Edge riccardoFriendOfCristian = riccardo .addEdge("knows", cristian); System.out.println("* out degree:"); for(Vertex v : graph.getVertices()) { System.out.println(format("%s has %d outgoing edges", v, size(v.getEdges(Direction.OUT)) )); } System.out.println("* degree:"); for(Vertex v : graph.getVertices()) { System.out.println(format("%s's degree: %d", v, size(v.getEdges(Direction.BOTH)) )); } System.out.println("* list 'friendOf' edges:"); for(Edge e : graph.getEdges("label", "friendOf")) { System.out.println(format("%s -> %s", e.getVertex(Direction.OUT), e.getVertex(Direction.IN) )); } graph.shutdown(); }
Pipes is a lazy dataflow framework using process graphs
@Test public void myFirstPipeTest() { List<String> romans = Lists.newArrayList("MMXV", "MCMLXXXIII", "I"); TransformPipe<String, Integer> romanToInt = new RomanToIntPipe(); FilterPipe<Integer> bigInteger = new BigIntegerPipe(1000); TransformPipe<Integer, Integer> makeOdd = new MakeOddPipe(); romanToInt.setStarts(romans); bigInteger.setStarts((Iterable<Integer>) romanToInt); makeOdd.setStarts((Iterable<Integer>) bigInteger); while(makeOdd.hasNext()) { System.out.println(makeOdd.next()); } }
output: 4031, 3967
@Test public void metaPipeTest() { List<String> romans = Lists.newArrayList("MMXV", "MCMLXXXIII", "I"); TransformPipe<String, Integer> romanToInt = new RomanToIntPipe(); FilterPipe<Integer> bigInteger = new BigIntegerPipe(1000); TransformPipe<Integer, Integer> makeOdd = new MakeOddPipe(); Pipeline<String, Integer> pipeline = new Pipeline<>(romanToInt, bigInteger, makeOdd); pipeline.enablePath(true); pipeline.setStarts(romans); while(pipeline.hasNext()) { System.out.println(pipeline.next()); System.out.println(pipeline.getCurrentPath()); } }
output: 4031: [MMXV, 2015, 4031]
output: 3967: [MCMLXXXIII, 1983, 3967]
How is Pipes related to graphs?
@Test public void graphPipeTest() { Graph graph = PizzaGraphFactory.create(); VerticesVerticesPipe out = new VerticesVerticesPipe(Direction.OUT); PipeFunction<LoopPipe.LoopBundle, Boolean> proceedCondition = new PipeFunction<LoopPipe.LoopBundle, Boolean>() { @Override public Boolean compute(LoopPipe.LoopBundle argument) { Element v = (Element) argument.getObject(); return !v.getId().equals("trentino"); } }; LoopPipe loop = new LoopPipe(out, proceedCondition); Pipeline pipeline = new Pipeline(loop); pipeline.enablePath(true); pipeline.setStarts(graph.getVertices("id", "nicola")); }
LoopPipe loop = new LoopPipe(out, proceedCondition); Pipeline pipeline = new Pipeline(loop); pipeline.enablePath(true); pipeline.setStarts(graph.getVertices("id", "nicola")); while(pipeline.hasNext()) { System.out.println(pipeline.next()); System.out.println(pipeline.getCurrentPath()); }
v[trentino]: [v[nicola], v[oro], v[mushroom], v[trentino]]
v[trentino]: [v[nicola], v[oro], v[spek], v[trentino]]
v[trentino]: [v[nicola], v[riccardo], v[boscaiola], v[mushroom], v[trentino]]
Okay Doc, bring me back to SQL!
Gremlin is a graph traversal language
rayman@HAL9100 ~/gremlin-groovy-2.6.0 $ bin/gremlin.sh \,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin> g = new TinkerGraph() ==>tinkergraph[vertices:0 edges:0] gremlin> g.loadGraphML('/tmp/pizza.graphml') ==>null gremlin> gremlin> g.v('nicola').out().loop(1){it.object.id!='trentino'}.path ==>[v[nicola], v[oro], v[spek], v[trentino]] ==>[v[nicola], v[oro], v[mushroom], v[trentino]] ==>[v[nicola], v[riccardo], v[boscaiola], v[mushroom], v[trentino]] gremlin> gremlin> g.v('nicola').out().loop(1){it.object.id!='trentino'} .in().in().dedup() ==>v[oro] ==>v[boscaiola]
@Test public void pizzaGremlinJavaTest() { Graph graph = PizzaGraphFactory.create(); GremlinPipeline pipeline = new GremlinPipeline(); pipeline.start(graph.getVertex("nicola")) .as("explore") .out().as("outgoing") .loop("explore", PipesTest.proceedCondition) .path(); while(pipeline.hasNext()) System.out.println(pipeline.next()); }
pizzas which are liked by Nicola and have an ingredient typical of Trentino
g.V('id', 'nicola') .out('likes').as('pizza') .out('contains') .out('typicalOf').has('id', 'trentino') .back('pizza')
==>v[oro]
pizzas which contains at least one ingredient contained in Boscaiola
g.V('id', 'boscaiola').as('pizza') .out('contains') .in('contains') .except('pizza')
==>v[oro] ==>v[diavola] ==>v[margherita] ==>v[oro]
starting from Nicola, explore all the outgoing relations untill Trentino is found
g.v('nicola') .out() .loop(1){it.object.id!='trentino'} .path
==>[v[nicola], v[oro], v[spek], v[trentino]] ==>[v[nicola], v[oro], v[mushroom], v[trentino]] ==>[v[nicola], v[riccardo], v[boscaiola], v[mushroom], v[trentino]]
from Cristian follow two paths: the pizzas he likes and the pizzas which are liked by his friends
g.V('id', 'cristian').copySplit( _().out('likes').id, _().both('friendOf').out('likes').id ).fairMerge()
==>diavola ==>margherita ==>oro
t = new Table() g.V().as('person') .out('likes') .out('contains').as('ingredient') .table(t)
gremlin> t ==>[person:v[cristian], ingredient:v[salami]] ==>[person:v[cristian], ingredient:v[tomatoe]] ==>[person:v[cristian], ingredient:v[cheese]] ==>[person:v[nicola], ingredient:v[tomatoe]] ==>[person:v[nicola], ingredient:v[cheese]] ==>[person:v[nicola], ingredient:v[cheese]] ==>[person:v[nicola], ingredient:v[spek]] ==>[person:v[nicola], ingredient:v[mushroom]] ==>[person:v[riccardo], ingredient:v[cheese]] ==>[person:v[riccardo], ingredient:v[mushroom]]
@Test public void pizzaGremlinGroovyTest() throws ScriptException { Graph graph = PizzaGraphFactory.create(); ScriptEngineManager manager = new ScriptEngineManager(); ScriptEngine engine = manager.getEngineByName("gremlin-groovy"); Bindings bindings = engine.createBindings(); bindings.put("graph", graph); bindings.put("nicola", graph.getVertex("nicola")); GremlinGroovyPipeline pipeline = (GremlinGroovyPipeline) engine.eval( "nicola.out().loop(1){it.object.id!='trentino'}.path", bindings); while(results.hasNext()) System.out.println(results.next()); }
Rexster is a graph server
rayman@HAL9100 ~/rexster-server-2.6.0 $ ./bin/rexster.sh -s
http://localhost:8182/doghouse/main/graph/pizzajugFrames exposes any Blueprints graph as a collection of interrelated domain objects (an Hibernate for graphs?!).
Furnace is a collection of graph algorithms running over Blueprints interface
Wait a minute: I've heard of Triple Stores...
a triple is a statment regarding:
@prefix jug: <http://www.jugtaas.org/owl/jug.owl#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix po: <http://www.co-ode.org/ontologies/pizza/pizza.owl#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . SELECT ?pizza WHERE { jug:nicola a foaf:Person . jug:nicola jug:likes ?pizza . ?pizza a po:Pizza . ?pizza po:hasIngredient ?ingredient . ?ingredient jug:isTypicalOf ?place . ?place rdfs:label "Trentino-Alto Adige" }
Do you prefer Gremlin?
* only for those use case that require a Graph!!!
SELECT ?name ?age WHERE { ?person v:label "person" . ?person v:name ?name . ?person v:age ?age . ?person e:created ?project . FILTER (?age > 30) }sparql-gremlin
...the new standard could also be something else than SPARQL or Gremlin
MATCH (node1)-->(node2) RETURN node2.propertyA, node2.propertyBOpenCypher
Object Oriented Databases are the common ancestor of Graph Databases and Triple Stores!
ODatabaseDocumentTx db = new ODatabaseDocumentTx(DATABASE_URL).create(); initSchema(db); ODocument luke = new ODocument("Person"); luke.field("name", "Luke"); luke.field("surname", "Skywalker"); // http://starwars.wikia.com/wiki/Luke_Skywalker ODocument lukePhysical = new ODocument() .field("species", "human") .field("gender", "male") .field("height", 1.72) // implicit meters .field("mass", 77) // implicit kg .field("hair", "blonde") .field("eyes", "blue") .field("cybernetics", "Prosthetic right hand"); luke.field("physical", lukePhysical); luke.save();
// http://starwars.wikia.com/wiki/Polis_Massa ODocument polisMassa = new ODocument("Place") .field("region", "Outer Rim Territories") .field("sector", "Subterrel sector") .field("system", "Polis Massa System") // ... .save(); luke.field("born", polisMassa); luke.save(); db.close();
System.out.println(luke); Person#9:0 { name: Luke, surname: Skywalker, physical: { species: human, gender: male, height: 1.72, mass: 77, hair: blonde, eyes: blue, cybernetics: Prosthetic right hand }, born: #10:0 }v1
private void initSchema(ODatabaseDocumentTx db) { OClass person = db.getMetadata().getSchema() .createClass("Person"); person.createProperty("physical", OType.EMBEDDED); person.createProperty("born", OType.LINK); }
Do you remember the Blueprints example? It's the same!
Graph graph = new OrientGraph(DATABASE_URL); Vertex nicola = graph.addVertex("nicola"); ...
Just remember the right implementation!
Graph Model: something more than PGM
You can also play with gremlins!
Define your domain POJO
package com.github.raymanrt.orientdb4jug.orient.starwars; public class Person { private String name; private String surname; private Physical physical; private Place born; public Person() {}; // getters and setters }
package com.github.raymanrt.orientdb4jug.orient.starwars; public class Jedi extends Person { public Jedi() {}; }
Setup the environment
OObjectDatabaseTx db = new OObjectDatabaseTx(DATABASE_URL).create(); db.getEntityManager() .registerEntityClasses("com.github.raymanrt.orientdb4jug.orient.starwars"); OClass person = db.getMetadata().getSchema().getClass("Person"); person.createProperty("physical", OType.EMBEDDED);
Work with your data
Person padme = db.newInstance(Person.class); padme.setName("Padme"); padme.setSurname("Amidala"); db.save(padme); Jedi luke = db.newInstance(Jedi.class); luke.setName("Luke"); luke.setSurname("Skywalker"); Physical physical = new Physical(); physical.setHair("blonde"); physical.setEyes("blue"); luke.setPhysical(physical); Place polisMassa = db.newInstance(Place.class); polisMassa.setName("Polis Massa"); // ... db.save(polisMassa); luke.setBorn(polisMassa); db.save(luke);
Work with your data
assertEquals(2, db.countClass("Person")); assertEquals(1, db.countClass("Jedi")); assertEquals(1, db.countClass("Place")); ODocument lukeAsDocument = db.getRecordByUserObject(luke, false); assertNotNull(lukeAsDocument.getIdentity()); assertNotEquals(ORecordId.EMPTY_RECORD_ID, lukeAsDocument.getIdentity()); System.out.println(lukeAsDocument);
inspired by SQL to be friendly
each field can be declared as:
SELECT name, @rid, out('likes') FROM Person WHERE name = 'Nicola'
operators:
SELECT name FROM Pizza WHERE 'trentino' in out('contains').out('typicalOf').name AND 'nicola' in in('likes').name
SELECT name LET $typicalPlace = out('contains').out('typicalOf').name FROM Pizza WHERE 'trentino' in $typicalPlace AND 'nicola' in in('likes').name
SELECT name LET $typicalPlace = out('contains').out('typicalOf').name, $nicolasFriends = ( SELECT FROM Person WHERE 'nicola' in both('friendOf').name ), $pizzaLikers = in('likes') FROM Pizza WHERE 'trentino' in $typicalPlace AND ('nicola' in $pizzaLikers.name OR $nicolasFriends in $pizzaLikers)
TRAVERSE out('friendOf') FROM (SELECT FROM Person WHERE name = 'nicola') WHILE $depth <= 3 STRATEGY BREADTH_FIRST
SELECT shortestPath(#8:32, #8:10, 'OUT', 'friendOf')
SELECT dijkstra(#8:32, #8:10, 'weightEdgeFieldName', 'OUT')
Try with orientqb (inspired by jOOQ)
Do you prefer this:
Query q = new Query() .select(Projection.ALL) .from("Class") .where(projection("f2").eq(5)) .where(projection("f3").lt(0));
Or that?
String q = "SELECT *" + "FROM Class" + "WHERE f2 = 5 AND f3 < 0";
our use case for OrientDB
main query:
SELECT name, label, in('hasActant').out('relatedToChapter') as chapters, in('hasActant').out('locatedIn') as locations, in('hasTag') as contributions, FROM #21:6
speech cloud:
SELECT localName, $seq.size() FROM Agent LET $seq = ( SELECT FROM Sequence LET $speakers = in('inSequence').out('speaker') WHERE $parent.$current in $speakers AND #21:6 in $speakers ) ORDER BY localName
how would you model a edge?
regular edges
lightweight edges