Going Beyond Cypher in Neo4J with Javascript

Going Beyond Cypher in Neo4J with Javascript

Cypher is a very powerful language and expressive. The query planners keep getting better and better to provide better performance as it keeps evolving. To give an example in 3.5, if you wanted to traverse a binary tree of depth 500 to count the number of nodes the root is connected to, it would have taken more than few seconds. In 4.4 onwards it takes only few milli-seconds. The Cypher language itself being extended to be able to express more complex query expressions and sub-queries help us build more complex queries in an efficient manner, which were not possible earlier.

There are still quite a few scenarios where Cypher might be limited. Say you have a complex object that you want to ingest by calculating delta of changes or you need to traverse in a specific manner while collecting only some part of the information in an efficient manner. In these situations writing a stored procedure can be very useful to process the data in a more performant way. They can be faster and use less memory. APOC procedures are great examples of this. As Cypher kept evolving, APOC kept adding more functionality using stored procedures to be able to do more things from Cypher. Here’s a great example of how stored procedures can get most out of graph Writing a Graph Database Stored Procedure in Neo4j — Part I.

There are lot of examples and pretty good documentation (User Defined Procedures and Functions — Developer Guides (neo4j.com)) on how to write stored procedures using Java and Kotlin etc. But biggest roadblock is to setup a Java or Kotlin development environment and to deploy you need to copy the jar file to plugins directory and restart the instance. On a cluster this can be a painful process.

To alleviate this and to see how we can make this process more developer friendly, I looked at using Javascript as the language. The idea is be able to register Javascript function as stored procedure and be able to invoke it from Cypher. To achieve this we built a plugin that provides methods “register” and “invoke”. Each database will have a ScriptEngine associated with it. It is used to make sure we can have the javascript function compiled and ready to be invoked. By avoiding validating the function at runtime we can gain better performance. The register method validates the script and performs few security checks before creating node to persist the script with database. Note that these methods can only take a Map as input to make the signature simple and fixed. The script is added to the script engine. The invoke method first checks if the method exists. If the method exists, then it injects the current transaction and log instances into the input before executing the function. The response is added to the out map and returned. One thing to note is that, since we are using the Javascript engine that ties into Java all the Java API’s of Neo4j database as available to the Javascript methods.

Let’s take a look at the Java function and te corresponding version in Javascript.

@Context
public Transaction tx;

@Procedure(name = "example.countNodes", mode = Mode.READ)
public Stream<Output> countNodes() {
    return Stream.of(new Output(tx.getAllNodes().stream().count())) ;
}        

This is the Java version of a stored procedure that returns the count of nodes in the database.

function nodeCount(params)
{
    var log = params['log'] ;
    var txn = params['txn'] ;
    log.info('Testing log') ;
    return txn.getAllNodes().stream().count()
}        

We can see that the Javascript version is pretty similar to the Java version. One major difference we can see that the transaction is injected by the database to the Java function, where are we are injecting using the input map.

Now let’s look how to register and invoke the functions.

CALL js.procedure.register(
"function nodeCount(params) 
{ var log = params['log'] ; 
    var txn = params['txn'] ; 
    log.info('Testing log') ;  
    return txn.getAllNodes().stream().count() 
}", "nodecount", {})        

This cypher request registers the a javascript method named nodeCount with a public name nodecount. Note that the actual javascript method name and the publicly referenced name can be different. This approach is taken to separate the name spaces and may be support private javascript functions that may not be invoked explicitly and support other public methods. The last parameter which is an empty Map does not provide any value at this time and will support configurations in future.

CALL js.procedure.invoke("nodecount", {}) YIELD map
RETURN map.result as result        

The plugin can be accessed at https://github.com/neo4j-field/js-stored-procedures. This has 3 different branches. The main branch supports Neo4j 5.3 and is built on Nashorn JS engine. The neo_4_4_compatible supports Neo4j 4.4.x and is built on Nashorn JS engine. The graalvm_compatible branch supports Neo4j 5.3 and leverages Nashorn API to get the parse tree of javascript method to do validation and leverages GraalVM engine to actually execute the methods.

Some not so scientific testing shows that there is no performance degrade using Javascript to build stored procedures.

The Good:

  1. There is no need to setup development environment to write stored procedures.
  2. There is no need to restart the cluster when you make changes to the procedure.
  3. The procedure is limited to the database it is created. This can be huge security feature in multi-database environment. When a Java stored procedure is deployed it can be invoked against any database, except System database.
  4. The create/update/delete of stored procedures can be limited to administrators by leveraging RBAC functionality of Neo4j.
  5. Since, it is simple to update or delete the procedure, it becomes easier to deploy critical fixes in a quick and efficient manner.

The Bad:

  1. As of now there is no testing framework.
  2. When you change or delete the method it may take up to an hour to reflect those changes in follower nodes in a cluster. This is controlled by changing the values on JS_ProcedureLock node.

Don’t miss my book on Cypher Graph Data Processing with Cypher.

#javascript #neo4j #cypher #packt

Like
Reply

Ravi, you mastery of the language always provides me with insight that helps me tackle very difficult problems. Understanding how this works in a particular stack adds incredible value. It would be great for you to go further and include use cases. You must have numerous examples I could benefit from.

Like
Reply

To view or add a comment, sign in

More articles by Ravindranatha Anthapu

  • Modeling Patient Journeys with Neo4j

    By Matt Holford & Ravi Anthapu We do a lot of what we call “event-based” modeling at Neo4j. This kind of modeling is…

    3 Comments

Others also viewed

Explore content categories