Booting-Up Your Network with GrapheneDB using Neo4j and Python

Zecca J. Lehn
6 min readDec 17, 2018
Photo by Jimena on Unsplash

Graphs (think Vertices and Edges here), where the relationship from metadata provided by social networks, for example, help uncover hidden relationships among n-grams / likes / demographic data / tags / links / names. Graph databases and applications which support them, help explore this metadata in a faster-to-compute distributed environment.

Many open source packages and graph platforms exist, with support from both Python and R (even Spark). In this post, the popular Neo4j was chosen, primarily because I’ve been wanting to learn how to use it, and because there is a great book on it(currently available for free here from O’Reilly), which supports it’s Cypher querying language.

Example Cypher Query
Photo by Markus Spiske on Unsplash

Let’s Connect

As Data Scientists, we focus primarily on topics such as data cleaning, exploration, munging, statistics, supervised / unsupervised machine learning models, distributed / parallel programming, and data infrastructure. However, in the field of Information Science — where overlap exists between the Data Science world by default, working with graphs is much more aligned with this area of applied research. I hope this post helps merge both of these fields in line with more infrastructural fields such as Data Engineering and DevOps. Follow my Twitter account for more posts such as this via #FullStackDS — which explore the intersection between these specialized fields.

When initially exploring Neo4j this week for the first time, I set it up locally on Windows by installing the .ext, and opened a localhost port to connect. I found it frustrating to connect locally to Neo4j as it was causing some issues with Python packages, especially with httpsand bolt framework (encrypted service) and to plot D3 inline. The conflict was further compounded using newer packages from Python such as py2neo — which is fantastic btw, and was finally able to get to work with GrapheneDB and D3 in the post below. As a side track, this also led me to start exploring options such as using Google Cloud Platorm (GCP), and setting up ports and managing security settings while connecting remotely — this can be done, but it’s not an easy setup to start exploring the functionality of Neo4j with Python quickly. Fortunately, I fell across a great project, GrapheneDB, which has a free tier to quickly spin up databases for free or at a low cost, which can be securely tunneled into from a local Jupyter Notebook running Python 3.5 kernel in a snap!

Let’s walk through getting you set up with GrapheneDB first. Before we make our first graph, and visualize it graphically using a D3 network node plot directly within Jupyter.

Step1 GrapheneDB Setup

After you get yourself set up at https://www.graphenedb.com/ we’ll want to create a Neo4j Database. Click on Create Database.

Next we’ll create a test database named neoawesomeness.

While still on the New Database page, you’ll want to name the database. Here we’ll call it “neoawesesomeness” — all lowercase.

And then click Create Database. This will bring us to a new page, where we’ll create a New User.

Next we’ll create an admin user “neoAwesomeUser” controlling this database, and select the option to leave the database active indefinitely. We can choose to have these expire by month, week, or even hours.

Once the new user is created, we’ll want to store our new password for the database. Don’t lose it, because from what I can gather, it won’t be recoverable.

Now, from the Network Access page we can get the address of our new Neo4j remote database, along with other statistics, and ability to control access rights to users.

Note: We’ll copy the http and paste it into the code without the ending extension ‘/db/data/’

Step 2: Connect GrapheneDB to Python 3.5

Assuming you already have Anaconda on your local machine, and you’ve created an active Python 3.5 Python conda environment to work from, you’ll now want to launch jupyter notebook in your web browser. As of now, using Python 3.5, you’ll want to first pip install py2neo==3.0.0 to work with the below code — newer versions of py2neo have connection issues. All can be accessed via a Github Gist here.

Next we run the below Node and Relationship creation, adapted from the excellent Python / Neo4j notebook series created by Nicole White when she was with Neo4j (ref: https://nicolewhite.github.io/neo4j-jupyter/hello-world.html)

Here we check the node labels, which we’ll need for plotting. In addition, the .relatioinship_types method passes the names of actions related to each edge. To print out the types of drinks Nicole likes, we pass rel at the end given the name object from the Person node.

Here we’ll use the neo4jupyter package, which Gabi Maeztu generously shared, which allows for using py2neo classes with D3 plotting inline. His package is significant because other examples lack D3 inline features, where labels are visable. However, as you’ll note from the rest of the notebook here, there are some features for filtering which may be lacking. Therefore, we’ll turn back to GrapheneDB and viewing the data more interactively as a dynamic visualization.

Step 3 Launch Interactive Shell Neo4j / GrapheneDB

We’ll return to our browser, and launch with the below now.

Once launched in a new browser tab, in the top header $, run MATCH (n) RETURN LIMIT 25 and we’ll get the same interactive plot returned in Jupyter Notebook using the Cypher query language under the hood!

Summary

Photo by Ian Robinson on Unsplash

This short walk through shows the power of both Neo4j along with GrapheneDB without a need to install neo4j locally. It gives us the ability to drill down into our complex metadata and explore relationships with minimal engineering overhead. Look forward to explore more on the use of graphs in future posts. In the mean time, check out my current post on using Python to explore metadata within your Github Gists. All code and additional exploration of the excellent Movie Database using py2neocan be found in the attached Jupyter Notebook here.

Note: Neo4j refers to “Node” (for Vertice) and “Relationship (for Edge).

--

--