Gremlin is a
graph traversal
In computer science, graph traversal (also known as graph search) refers to the process of visiting (checking and/or updating) each vertex in a graph. Such traversals are classified by the order in which the vertices are visited. Tree traversa ...
language and
virtual machine
In computing, a virtual machine (VM) is the virtualization or emulator, emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve ...
developed by Apache TinkerPop of the
Apache Software Foundation
The Apache Software Foundation ( ; ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open-source software projects. The ASF was formed from a group of developers of the ...
. Gremlin works for both
OLTP-based graph databases as well as
OLAP
In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term ''OLAP'' was created as a slight modification of the traditional database term online transaction processi ...
-based graph processors. Gremlin's
automata and
functional language foundation enable Gremlin to naturally support:
imperative and
declarative querying; host language agnosticism; user-defined
domain specific languages; an extensible compiler/optimizer, single- and multi-machine execution models; hybrid depth- and breadth-first evaluation with
Turing completeness
In computability theory, a system of data-manipulation rules (such as a model of computation, a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can b ...
.
As an explanatory analogy, Apache TinkerPop and Gremlin are to
graph databases what the
JDBC
Java Database Connectivity (JDBC) is an application programming interface (API) for the Java (programming language), Java programming language which defines how a client may access a database. It is a Java-based data access technology used for Java ...
and
SQL are to
relational databases. Likewise, the Gremlin traversal machine is to graph computing as what the
Java virtual machine
A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally descr ...
is to general purpose computing.
History
* 2009-10-30 the project is born, and immediately named "TinkerPop"
* 2009-12-25 v0.1 is the first release
* 2011-05-21 v1.0 is released
* 2012-05-24 v2.0 is released
* 2015-01-16 TinkerPop becomes an Apache Incubator project
* 2015-07-09 v3.0.0-incubating is released
* 2016-05-23 Apache TinkerPop becomes a top-level project
* 2016-07-18 v3.1.3 and v3.2.1 are first releases as Apache TinkerPop
* 2017-12-17 v3.3.1 is released
* 2018-05-08 v3.3.3 is released
* 2019-08-05 v3.4.3 is released
* 2020-02-20 v3.4.6 is released
Vendor integration
Gremlin is an
Apache2-licensed graph traversal language that can be used by graph system vendors. There are typically two types of graph system vendors: OLTP
graph databases and OLAP graph processors. The table below outlines those graph vendors that support Gremlin.
Traversal examples
The following examples of Gremlin queries and responses in a Gremlin-Groovy environment are relative to a graph representation of th
MovieLensdataset.
The dataset includes users who rate movies. Users each have one occupation, and each movie has one or more categories associated with it. The MovieLens graph schema is detailed below.
user--rated tars:0-5->movie
user--occupation-->occupation
movie--category-->category
Simple traversals
gremlin> g.V().label().groupCount()
> ccupation:21, movie:3883, category:18, user:6040
gremlin> g.V().hasLabel('movie').values('year').min()
>1919
gremlin> g.V().has('movie','name','Die Hard').inE('rated').values('stars').mean()
>4.121848739495798
Projection traversals
gremlin> g.V().hasLabel('category').as('a','b').
select('a','b').
by('name').
by(inE('category').count())
> :Animation, b:105h1>
> :Children's, b:251h1>
> :Comedy, b:1200h1>
> :Adventure, b:283h1>
> :Fantasy, b:68h1>
> :Romance, b:471h1>
> :Drama, b:1603h1>
> :Action, b:503h1>
> :Crime, b:211h1>
> :Thriller, b:492h1>
> :Horror, b:343h1>
> :Sci-Fi, b:276h1>
> :Documentary, b:127h1>
> :War, b:143h1>
> :Musical, b:114h1>
> :Mystery, b:106h1>
> :Film-Noir, b:44h1>
> :Western, b:68
gremlin> g.V().hasLabel('movie').as('a','b').
where(inE('rated').count().is(gt(10))).
select('a','b').
by('name').
by(inE('rated').values('stars').mean()).
order().by(select('b'),decr).
limit(10)
> :Sanjuro, b:4.608695652173913h1>
> :Seven Samurai (The Magnificent Seven), b:4.560509554140127h1>
> :Shawshank Redemption, The, b:4.554557700942973h1>
> :Godfather, The, b:4.524966261808367h1>
> :Close Shave, A, b:4.52054794520548h1>
> :Usual Suspects, The, b:4.517106001121705h1>
> :Schindler's List, b:4.510416666666667h1>
> :Wrong Trousers, The, b:4.507936507936508h1>
> :Sunset Blvd. (a.k.a. Sunset Boulevard), b:4.491489361702127h1>
> :Raiders of the Lost Ark, b:4.47772
Declarative pattern matching traversals
Gremlin supports declarative graph pattern matching similar to
SPARQL. For instance, the following query below uses Gremlin's ''match()''-step.
gremlin> g.V().
match(
__.as('a').hasLabel('movie'),
__.as('a').out('category').has('name','Action'),
__.as('a').has('year',between(1980,1990)),
__.as('a').inE('rated').as('b'),
__.as('b').has('stars',5),
__.as('b').outV().as('c'),
__.as('c').out('occupation').has('name','programmer'),
__.as('c').has('age',between(30,40))).
select('a').groupCount().by('name').
order(local).by(valueDecr).
limit(local,10)
>Raiders of the Lost Ark=26
>Star Wars Episode V - The Empire Strikes Back=26
>Terminator, The=23
>Star Wars Episode VI - Return of the Jedi=22
>Princess Bride, The=19
>Aliens=18
>Boat, The (Das Boot)=11
>Indiana Jones and the Last Crusade=11
>Star Trek The Wrath of Khan=10
>Abyss, The=9
OLAP traversal
gremlin> g = graph.traversal(computer(SparkGraphComputer))
>graphtraversalsource gryooutputformat">adoopgraph[gryoinputformat->gryooutputformat sparkgraphcomputer">ryoinputformat->gryooutputformat.html" ;"title="adoopgraph[gryoinputformat->gryooutputformat">adoopgraph[gryoinputformat->gryooutputformat sparkgraphcomputergremlin> g.V().repeat(outE('rated').has('stars', 5).inV().
groupCount('m').by('name').
inE('rated').has('stars', 5).outV()).
times(4).cap('m')
>Star Wars Episode IV - A New Hope 35405394353105332
>American Beauty 31943228282020585
>Raiders of the Lost Ark 31224779793238499
>Star Wars Episode V - The Empire Strikes Back 30434677119726223
>Godfather, The 30258518523013057
>Shawshank Redemption, The 28297717387901031
>Schindler's List 27539336654199309
>Silence of the Lambs, The 26736276376806173
>Fargo 26531050311325270
>Matrix, The 26395118239203191
Gremlin graph traversal machine
Gremlin is a
virtual machine
In computing, a virtual machine (VM) is the virtualization or emulator, emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve ...
composed of an
instruction set
In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...
as well as an execution engine. An analogy is drawn between Gremlin and Java (programming language), Java.
Gremlin steps (instruction set)
The following traversal is a Gremlin traversal in the Gremlin-Java8 dialect.
g.V().as("a").out("knows").as("b").
select("a","b").
by("name").
by("age")
The Gremlin language (i.e. the
fluent-style of expressing a graph traversal) can be represented in any host language that supports
function composition
In mathematics, the composition operator \circ takes two function (mathematics), functions, f and g, and returns a new function h(x) := (g \circ f) (x) = g(f(x)). Thus, the function is function application, applied after applying to . (g \c ...
and
function nesting. Due to this simple requirement, there exists various Gremlin dialects including Gremlin-Groovy, Gremlin-Scala, Gremlin-Clojure, etc. The above Gremlin-Java8 traversal is ultimately compiled down to a step sequence called a ''traversal''. A string representation of the traversal above provided below.
raphStep([vertex)@[a">html" ;"title="raphStep([">raphStep([vertex)@[a VertexStep(OUT,[knows">quot;>raphStep([<_a>vertex)@ raphStep([vertex)@[a">html" ;"title="raphStep([">raphStep([vertex)@[a VertexStep(OUT,[knowsvertex)@[b">.html" ;"title="html" ;"title="raphStep([">raphStep([vertex)@[a">html" ;"title="raphStep([">raphStep([vertex)@[a VertexStep(OUT,[knowsvertex)@[b SelectStep([a, b],[value(name), value(age)])]
The ''steps'' are the primitives of the Gremlin graph traversal machine. They are the parameterized instructions that the machine ultimately executes. The Gremlin
instruction set
In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...
is approximately 30 steps. These steps are sufficient to provide general purpose computing and what is typically required to express the common motifs of any graph traversal query.
Given that Gremlin is a language, an instruction set, and a virtual machine, it is possible to design another traversal language that compiles to the Gremlin traversal machine (analogous to how Scala compiles to the
JVM). For instance, the popular
SPARQL graph pattern match language can be compiled to execute on the Gremlin machine. The following SPARQL query
SELECT ?a ?b ?c
WHERE
would compile to
raphStep([vertex), MatchStep(AND,MatchStartStep(a), LabelStep, IsStep(eq(Person)), MatchEndStep">html" ;"title="raphStep([">raphStep([vertex), MatchStep(AND,MatchStartStep(a), LabelStep, IsStep(eq(Person)), MatchEndStep [MatchStartStep(a), VertexStep(OUT,[knows],vertex), MatchEndStep(b)], [MatchStartStep(a), VertexStep(OUT,[created],vertex), MatchEndStep(c)], [MatchStartStep(b), VertexStep(OUT,[created],vertex), MatchEndStep(c)], [MatchStartStep(b), PropertiesStep( gevalue), MatchEndStep(d)], atchStartStep(d), IsStep(gt(30)), MatchEndStep), SelectStep( , b, c].
In Gremlin-Java8, the SPARQL query above would be represented as below and compile to the identical Gremlin step sequence (i.e. traversal).
g.V().match(
as("a").label().is("person"),
as("a").out("knows").as("b"),
as("a").out("created").as("c"),
as("b").out("created").as("c"),
as("b").values("age").as("d"),
as("d").is(gt(30))).
select("a","b","c")
Gremlin Machine (virtual machine)
The Gremlin graph traversal machine can execute on a single machine or across a multi-machine compute cluster. Execution agnosticism allows Gremlin to run over both
graph databases (OLTP) and graph processors (OLAP).
See also
*
Cypher Query Language
Cypher is a declarative graph query language that allows for expressive and efficient data querying in a property graph.
Cypher was largely an invention of Andrés Taylor while working for Neo4j, Inc. (formerly Neo Technology) in 2011. Cypher wa ...
, another query language on graph data
*
SPARQL, another query language on graph data
*
Regular path query, a theoretical query language on graph data
*
Graph query language, a proposed standardization of a query language on graph data
References
External links
Apache TinkerPop Homepagesql2gremlin.com (TinkerPop2)# Rodriguez, M.A.,
The Gremlin Graph Traversal Machine and Language" Proceedings of the ACM Database Programming Languages Conference, October, 2015.
{{Query languages
Cluster computing
Data mining and machine learning software
Hadoop
Apache Software Foundation
Software using the Apache license
Java platform
Free software programmed in Scala
Declarative programming languages
Query languages