Stress Knowledge Map

Supported by the ADAPT project

PSS neo4j database

Basic schema

The basic schemae of the PSS database (in terms of reactions) is shown below:

For a more detailed description of the schema, see: PSS database schema

Cypher querying language

The standard query language to interface with a neo4j database is Cypher. In Cypher nodes are represented by () and relationships are represented by -[]->. An edge therefore has the following general syntax:

(source)-[relationship]->(target)

The following patterns apply to nodes:
() any node, no assigned variable
(:Metabolite) any Metabolite node
(m:Metabolite) any Metabolite node, assigned to variable m
( {<property>:<value>}) any node with property equal to value
The following patterns apply to relationships:
()--() any relationship
()-[:SUBSTRATE]-() any substrate relationship
()-[{<property>:<value>}]-() any relationship with property equal to value
()-[p:SUBSTRATE]-() any substrate relationship and assign to a variable called p
(:Metabolite)-[:SUBSTRATE]-(:Reaction) any substrate relationship between a Metabolite and a Reaction
(:Metabolite)-[:SUBSTRATE]->(:Reaction) any substrate relationship with Metabolite source and Reaction target (i.e. directed)
Basic query structures:
MATCH <pattern> searches for the given pattern in the db (SELECT in SQL)
OPTIONAL MATCH searches for the given pattern in the db, results in NULL if no match (still returns)
WHERE filter result, can be combined with AND OR XOR NOT
WHERE <entity.property> <comparison> <value> Comparisons include:
=< > <= >=
=~ (a regex expression)
IN (a list)
WHERE More filter options:
exists(<pattern>) <entity.property> STARTS WITH
<entity.property> ENDS WITH
<entity.property> CONTAINS
RETURN <result> returns values or results of a query
RETURN <result> AS <alias> return result with an alias property key instead of property name
DISTINCT filters to unique set

Further statements:

  • CREATE
  • SET
  • MERGE
  • DELETE
  • REMOVE

See also result aggregation etc.

Here are two sources for further reading:

Example cypher statements

Fetch all nodes:
MATCH (n)
RETURN DISTINCT n.name AS name
Fetch all undirected neighbours of JAR:
MATCH (n {name:"JAR"})--()
RETURN DISTINCT other.name AS name

Installation and deployment by Docker

Clone the repository to your computer:

    git clone https://github.com/nib-si/skm-neo4j.git

The correct branch is ---:

    cd skm-neo4j
    git checkout ---

Repository structure

The repository is organised as follows:

      skm-neo4j
        ├─ docker-compose.yaml
        ├─ docker-entrypoint.sh
        ├─ dockerfile
        ├─ conf
        ├─ logs
        ├─ data
        .   ├─ dumps
        .   ├─ raw
        .   ├─ import
        .   └─ db
        ├─ work
        └─ docs
            └─readme.md
          

docker-compose file
neo4j startup script
neo4j container build script
contains configuration files for neo4j and plugins
for neo4j logging

contains dumped databases
raw, original data (dev only)
for neo4j importing (dev only)
neo4j graph storage volume (dev only)
work scripts & notebooks
documentation folder

Set up neo4j database

to deploy the graph database on your computer as a container, you will need docker installed.

in the repository root folder, build the neo4j container with the graph:

    cd skm-neo4j
    docker build --tag skm-graph .

To run this image:

    docker run -it \
        -p7474:7474 -p1337:1337 -p7687:7687 \
        -v$pwd/logs:/logs -v$pwd/conf:/conf
        skm-graph

And visit the neo4j browser at http://localhost:7474/.

Link the neo4j database to a Jupyter notebook

Start up the neo4j database and link a jupyter notebook using docker-compose. To run the first time, navigate to the repo folder (containing docker-compose.yml) and run:

    docker-compose up

Thereafter you can use:

    docker-compose start

Open your browser at the http://localhost:8888/?token=... link printed to the terminal. If the logs are not printed, run

    docker-compose logs | grep 'http://localhost:.*/?token' | tail -1

to find the correct link.

To stop the containers, use:

    docker-compose stop

Or to stop and remove them:

    docker-compose down

Remove database

If you need to remove the graph image, first remove the container:

    docker-compose down

Or get the container ID from the first column in:

    docker ps -a | grep skm-graph

And delete it using:

    docker rm <container id>

Then delete the image:

    docker rmi skm-graph

Delete the folder with the graph data (if it exists).

    sudo rm -rf data/db/*

Dump a neo4j graph from Docker

In neo4j enterprise edition, follow instructions here.

For community edition, do the following

  1. Start up a neo4j container, without starting the neo4j database:

      docker run \
      --volume=$pwd/data/db/:/data \
      --volume=$pwd/data/dumps:/dumps \
      --ulimit=nofile=40000:40000 \
      --user=1000:1000 \
      -it \
      neo4j:4.3.2 \
      /bin/bash
    
  2. In the container, use neo4j-admin to dump the graph:

      bin/neo4j-admin dump --database=skm --to=/dumps/skm.dump
    

    Ctrl + d to exit the container.

  3. Remove the new container:

        docker rm dump
    

Please note that NEWT is a client-side application, and therefore relies on your local computer capabilities. Given the relatively large size of PSS, rendering the network may take a while.