My ex-colleague and friend Wijnand asked me if I could do a little project together with him. It involves working with graph databases, in particular Neo4j. Graph databases are different from “traditional” relational databases in the sense that they support semi-structured data without predefining a datamodel. In a way that’s also the case for other NoSQL databases, but when referring to NoSQL databases most people will think of the ones that are used to stored documents of all types. That’s not what graph databases are for.
A graph database is typically used when the relationships between the data are as important as the data itself. Consider persons. I know a lot of other people. I love some of them. I have worked with more. “KNOW”, “LOVE” and “WORKED WITH” are relationships between persons. In a graph database you create nodes and link them with relationships. Relationships are directional and explicit. Unlike relational databases where you have to add constraints to enforce relationship-integrity, a graph database cannot have a stale/dangling relationship.
Back to the persons I know, love, and worked with (yes, an Oxford comma). Persons are not nodes, so in the graph database you label them with “person” or “Persons” (note: define a standard before wildly labeling nodes). A node can even have more than one label.
Another thing that makes graph databases so different is that relationships can have properties. Since when do I know my best friend? In what period did I work with Wijnand? When creating/defining the relationship between two nodes, you can give that relationship properties, on which you can search or filter at a later stage.
The main advantage of a graph database over a relational database is its speed. When you have data with a lot of relationships, relational databases will easily drop performance when it needs to join multiple large (lots of rows) tables. This is not the case with a graph database. It excels in joins. Because that’s what its engine is made to do in the first place. When just iterating through data, the relational database will probably win, but with a lot of joins, a graph database can make a huge difference.
For example. Create a persons table in your relational database. Make it so that you can have friends, colleagues or loved ones that are persons in the same table. Now query for all your friends that where also a colleague within a certain period. Good luck writing your hierarchical query 🙂