Using parameters with the Query API
Imagine that you find yourself needing to perform bulk updates in some shape or form to Neo4j Graph database; it could be a data import, creating relationships between nodes, changing properties or anything else that requires a lot of mutations.
As an example of this, consider the work from an earlier blog post where we represented the London Underground as a graph; Nodes are used for Tube stations and Relationships are the Tube lines that form the network.
Take the Circle Line. There are 38 tube stations, each of which will be a Node and we will store this information about them
- Name
- London Underground ID
- Long
- Lat
- Tube Zone
To put this into Neo4j we could write a Cypher statement for each station. For example, create a node for Goldhawk Road Tube Station
MERGE ( s:Station {id:, name:"Goldhawk Road Underground Station" , zone:2, lat:51.502005, lon:-0.226715})
and then carry on with another 37 Cypher statements for all of the remaining tube stations on the Circle line.
Sending a single Cypher statement in this way is not very efficient. Apart from the overhead of establishing network connection each and every time, it skips over an optimisation that’s available with Neo4j - Parameterized Queries
Parameterized queries
A parameterized query is a query in which placeholders are used for parameters and the parameter values are supplied at execution time. This means you don’t need to resort to using multiple statements mentioned previously. It also means that you can avoid string building to create a query and having to escape problematic characters. Additionally, parameters make caching of execution plans much easier for Cypher, thus leading to significantly faster query execution times.
Parameterized queries are available for direct use with Cypher , Neo4j drivers and the Query API.
The Query API supports the use of parameters ( link to docs page ) with a request body that looks like this
{
'statement': Parameterized Cypher statement that references values,
'parameters': { "key" : [{Values}] }
}
If we used Query API with parameters for the Circle Line , then we could do this in a single request ( I’ll just put 5 Circle Line stations in to keep it readable ).
{
'statement':'WITH $items as batch UNWIND batch as item MERGE ( s:Station {id: item.id, name: item.name, zone: item.zone, lat:item.lat, lon:item.lon }),
'parameters': { "items" : [
{'id': '940GZZLUHSC', 'name': 'Hammersmith (H&C Line) Underground Station', 'zone': '2', 'lat': 51.49339, 'lon': -0.225033},
{'id': '940GZZLUGHK', 'name': 'Goldhawk Road Underground Station', 'zone': '2', 'lat': 51.502005, 'lon': -0.226715},
{'id': '940GZZLUSBM', 'name': "Shepherd's Bush Market Underground Station", 'zone': '2', 'lat': 51.505579, 'lon': -0.226375},
{'id': '940GZZLUWLA', 'name': 'Wood Lane Underground Station', 'zone': '2', 'lat': 51.509669, 'lon': -0.22453},
{'id': '940GZZLULRD', 'name': 'Latimer Road Underground Station', 'zone': '2', 'lat': 51.513389, 'lon': -0.217799}
]
}
}
Lets dig into this a bit deeper and see what is going on
We’ll start with parameters.
'parameters': { "items" : [
{'id': '940GZZLUHSC', 'name': 'Hammersmith (H&C Line) Underground Station', 'zone': '2', 'lat': 51.49339, 'lon': -0.225033},
{'id': '940GZZLUGHK', 'name': 'Goldhawk Road Underground Station', 'zone': '2', 'lat': 51.502005, 'lon': -0.226715},
{'id': '940GZZLUSBM', 'name': "Shepherd's Bush Market Underground Station", 'zone': '2', 'lat': 51.505579, 'lon': -0.226375},
{'id': '940GZZLUWLA', 'name': 'Wood Lane Underground Station', 'zone': '2', 'lat': 51.509669, 'lon': -0.22453},
{'id': '940GZZLULRD', 'name': 'Latimer Road Underground Station', 'zone': '2', 'lat': 51.513389, 'lon': -0.217799}
]
}
This is an array of JSON objects, each object having key:value pairs. The array is referenced by its key, items
Onto the Cyper statement. We can see the key name being used - items - and the array is going to be referenced by the label batch
WITH $items as batch
Next step is to loop around each JSON object in the array like so
UNWIND batch as item
Then we get to the Cypher statement that creates a new node if it does not already exist. We obtain the values we want to use by refering to the key name e.g item.name
MERGE ( s:Station {id: item.id, name: item.name, zone: item.zone, lat:item.lat, lon:item.lon })
What difference does it make?
To illustrate the difference parameters make, we’ll run a comparative test that creates the entire London Tube Network in Neo4j. We’ll use Neo4j Desktop as Query APi is not ( yet ) available in Aura. For the code, we can modify that used in previous articles on this topic.
As for timing, we’ll use a Python decorator to see how long the Neo4j request function call takes. Here it is , thanks to realpython.com
def timer(func):
@functools.wraps(func)
def wrapper_timer(*args, **kwargs):
tic = time.perf_counter()
value = func(*args, **kwargs)
toc = time.perf_counter()
elapsed_time = toc - tic
return value
return wrapper_timer
As we want to focus on Neo4j, lets re-use network connection to avoid the overhead of establishing one for each request. We will be using Neo4j Desktop locally so network conditions should not be an issue.
Another variable outside of our control is the response of the TFL API. As we’re not measuring that, we can ignore it.
Each test, with Parameters and then without Parameters, was run 3 times using these steps each time Once you have everything ready
- Start Neo4j
- Run Python code
- Note timing
- Empty Neo4j database with MATCH(n) DETACH DELETE(n)
- Stop Neo4j
The resulsts are as follows. All timings are in seconds
Test run | With Parameters | Without Parameters |
---|---|---|
1 | 2.5783 | 15.1745 |
2 | 3.0867 | 15.5487 |
3 | 2.6416 | 16.8796 |
Average | 2.6416 | 15.5487 |
It’s hard to avoid the conclusion - use parameters, they save a lot of time
Try it out at home
If you want to follow along at home then you will need
- Neo4j Desktop
- Using Neo4j Desktop, create a new database that uses Neo4j 5.19
- Change the database configuraiton to add this line
server.http_enabled_modules=TRANSACTIONAL_ENDPOINTS,UNMANAGED_EXTENSIONS,BROWSER,ENTERPRISE_MANAGEMENT_ENDPOINTS,QUERY_API_ENDPOINTS
- Python 3.12 or newer
- TFL API key
- The two Python files, With parameters and Without parameters
- Modified the Python files to use your Neo4j username , password and your TLF API key
Laters
JG