In this article, we will compare RDBMS, NoSQL DB & Graph DB. This is a development & implementation level comparison which will highlight differences on below criteria.
- Data structures & relationships
- Queries
- Data types like Text, Integer, Date etc.
- Special operations like sort, limit
- Aggregate functions like max
This article might not help in choosing among databases. Choosing among these databases is based on your requirements about data model, data usage, ACID properties etc.
Example Data for comparison
We will take example of storing candidates information for job interviews. We will use MySQL (RDBMS), MongoDB (NoSQL or Document database) & Neo4J (Graph database) & see how this data can be stored & retrieved from these databases.
- Candidate – Name (String), Years Of Experience (Number)
- Contact Details (One to One) – Mobile Number (String), Email id (String)
- Job History (One to Many) – List of all present & past jobs with company name (String), start(Date) & end dates (Date).
Create new database
MySQL
1 2 3 4 |
mysql> CREATE DATABASE itsallbinarydb; -- Provide name of database to create Query OK, 1 row affected (0.17 sec) mysql> use itsallbinarydb; -- Use newly created database Database changed |
MongoDB
1 2 |
> use itsallbinarydb // Provide name of database to create & use switched to db itsallbinarydb |
Neo4j
1 2 3 |
:USE system // First go to system database CREATE DATABASE itsallbinarydb // Provide name of database to create :USE itsallbinarydb // Use newly created database |
Create Data Structure
MySQL
- Data Structure: Table
- Each Data Entry: Row
- Schema: Need to provide table structure upfront including columns names, data types, relationships etc.
Below is a logical way to represent our example using 3 tables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
CREATE TABLE CONTACT_DETAILS ( -- Name of table ID int NOT NULL, -- Integer column EMAIL varchar(255) NOT NULL, -- String or characters column MOBILE varchar(255), PRIMARY KEY (ID) -- ID is primary unique key ); CREATE TABLE CANDIDATE ( ID int NOT NULL, NAME varchar(255) NOT NULL, YEARS_OF_EXP int, CONTACT_DETAILS_ID int, PRIMARY KEY (ID), FOREIGN KEY (CONTACT_DETAILS_ID) REFERENCES CANDIDATE(ID) -- Reference to CONTACT_DETAILS ); CREATE TABLE JOB_HISTORY ( CANDIDATE_ID int NOT NULL, COMPANY_NAME varchar(255) NOT NULL, START_DATE DATE, -- Date type column END_DATE DATE, FOREIGN KEY (CANDIDATE_ID) REFERENCES CANDIDATE(ID) -- Reference to CANDIDATE ); |
MongoDB
- Data Structure: Collection
- Each Data Entry: Document (format like JSON)
- Schema: Only need name of collections upfront. No need to provide structure of document within collection upfront. MongoDB does provide way for schema validation.
We have 2 ways to represent our example data in MongoDB as given below.
1) Embedded: All attributes in single collection
1 2 |
> db.createCollection("CANDIDATE_EMB"); // Provide name of collection { "ok" : 1 } |
2) Normalized/Referential: Three logical collections with references to each other.
1 2 3 4 5 6 |
> db.createCollection("CANDIDATE_REF"); // Provide name of collection { "ok" : 1 } > db.createCollection("CONTACT_DETAILS_REF"); { "ok" : 1 } > db.createCollection("JOB_HISTORY_REF"); { "ok" : 1 } |
Neo4j
- Data Structure: Graph
- Each Data Entry: Nodes & Relationships
- Schema: No need to provide anything upfront.
If everything is node, then how would you know which node represents what data? The answer is “Label“. All nodes with same label can be considered as same type of data. Like in our case CANDIDATE, CONTACT_DETAILS, JOB_HISTORY can be labels to tag the nodes of respective data. So Label can be approximately (but not exactly) analogous to table in RDBMS or collection in NoSQL.
Node with labels can be directly created whenever you get first entry of data.
Insert data
MySQL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
INSERT INTO CONTACT_DETAILS -- Table name to insert (ID, EMAIL, MOBILE) -- Columns to insert data in VALUES (1, 'abc@g.com', '+11234567890' ); -- Values to insert, String & Number values INSERT INTO CANDIDATE ( ID, NAME, YEARS_OF_EXP, CONTACT_DETAILS_ID) VALUES (1, 'Jimmy', 20, 1 ); INSERT INTO JOB_HISTORY (CANDIDATE_ID, COMPANY_NAME, START_DATE, END_DATE) VALUES (1, 'ABC Inc.', '2010-1-1', sysdate()); -- Date values from String & today's date |
MongoDB
1) Embedded document with all data into single JSON document.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
db.CANDIDATE_EMB.insertOne( // Collection name to insert { // Actual JSON document to insert ID: 1, // Integer data type NAME: "Jimmy", // String data type YEARS_OF_EXP: 20, CONTACT_DETAILS: { EMAIL: "abc@g.com", MOBILE: "+11234567890", }, JOB_HISTORY: [{ COMPANY_NAME: "ABC Inc.", START_DATE: new Date("2010-1-1"), // Date type from String END_DATE: new Date(ISODate()) // Date type for Today's date }] }); |
2) Referential form of storage where data is stored in 3 documents which refer to each other using ids.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
db.CANDIDATE_REF.insertOne( // Collection name to insert { ID: 1, NAME: "Jimmy", YEARS_OF_EXP: 20, CONTACT_DETAILS_ID: 1 // Reference to ID of CONTACT_DETAILS_REF }); db.CONTACT_DETAILS_REF.insertOne( { ID: 1, EMAIL: "abc@g.com", MOBILE: "+11234567890", }); db.JOB_HISTORY_REF.insertOne( { COMPANY_NAME: "ABC Inc.", START_DATE: new Date("2010-1-1"), END_DATE: new Date(ISODate()), CANDIDATE_ID: 1 // Reference to ID of CANDIDATE_REF }); |
Neo4j
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
CREATE (c:CANDIDATE // c=Variable name to refer later, CANDIDATE=Label of this node { // Attributes of this candidate node in JSON format. ID: 1, // Integer data type NAME: "Jimmy", // String data type YEARS_OF_EXP: 20 }) -[:CONTACTABLE_AT]-> // Label of Relationship between CANDIDATE & CONTACT_DETAILS (cd:CONTACT_DETAILS{ // Label & attributes of CONTACT_DETAILS node EMAIL: "abc@g.com", MOBILE: "+11234567890" }) match(c:CANDIDATE {ID: 1}) // Get node with ID 1 create (c) -[:WORKED_AT]-> // Create another relationship with label WORKED_AT (e:JOB_HISTORY { // Relationship between matched CANDIDATE with JOB_HISTORY COMPANY_NAME: "ABC Inc.", START_DATE: date("2010-1-1") , // Date from string END_DATE: date() // Date type for today's date. }) |
Query data, Search/select/find
MySQL
1 2 3 4 5 |
SELECT * FROM CANDIDATE, CONTACT_DETAILS, JOB_HISTORY -- Tables to fetch data from WHERE CANDIDATE.CONTACT_DETAILS_ID = CONTACT_DETAILS.ID -- Columns to join CANDIDATE & CONTACT_DETAILS AND JOB_HISTORY.CANDIDATE_ID = CANDIDATE.ID -- Columns to join CANDIDATE & JOB_HISTORY AND CANDIDATE.ID = 1; -- Match CANDIDATE with ID 1 |
1 2 3 4 5 |
+----+-------+--------------+--------------------+----+-----------+--------------+--------------+--------------+------------+------------+ | ID | NAME | YEARS_OF_EXP | CONTACT_DETAILS_ID | ID | EMAIL | MOBILE | CANDIDATE_ID | COMPANY_NAME | START_DATE | END_DATE | +----+-------+--------------+--------------------+----+-----------+--------------+--------------+--------------+------------+------------+ | 1 | Jimmy | 20 | 1 | 1 | abc@g.com | +11234567890 | 1 | ABC Inc. | 2010-01-01 | 2020-03-22 | +----+-------+--------------+--------------------+----+-----------+--------------+--------------+--------------+------------+------------+ |
MongoDB
1) Embedded: No need of joins as all data is available in single document. Single query & single collection scan to get all data. pretty() function displays output JSON in pretty printed & readable format.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
> db.CANDIDATE_EMB.find( { "ID" : 1} ).pretty(); // Search CANDIDATE_EMB collection & get document with ID 1 { "_id" : ObjectId("5e7846ca731625fa6d09b454"), "ID" : 1, "NAME" : "Jimmy", "YEARS_OF_EXP" : 20, "CONTACT_DETAILS" : { "EMAIL" : "abc@g.com", "MOBILE" : "+11234567890" }, "JOB_HISTORY" : [ { "COMPANY_NAME" : "ABC Inc.", "START_DATE" : ISODate("2010-01-01T00:00:00Z"), "END_DATE" : ISODate("2020-03-23T05:19:06.750Z") } ] } |
2) Referential: Need to use lookup to aggregate (similar to join in SQL) multiple documents & return as a single document.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
> db.CANDIDATE_REF.aggregate([{ ... $lookup:{ ... from: "JOB_HISTORY_REF", // Join with JOB_HISTORY collection ... foreignField: "CANDIDATE_ID", // Join field from JOB_HISTORY collection ... localField: "ID", // Field from JOB_HISTORY collection to join ... as: "JOB_HISTORY_JOIN" // Where to embed data in output ... } ... }, ... { ... $lookup:{ ... from: "CONTACT_DETAILS_REF", // Join with CONTACT_DETAILS_REF collection ... foreignField: "ID", // Join field from CONTACT_DETAILS_REF collection ... localField: "CONTACT_DETAILS_ID", // Field from CONTACT_DETAILS_REF collection to join ... as: "CONTACT_DETAILS_JOIN" // Where to embed data in output ... } ... }, ... { ... $match:{ ID : 1 } ... }]).pretty(); { "_id" : ObjectId("5e784b6e731625fa6d09b455"), "ID" : 1, "NAME" : "Jimmy", "YEARS_OF_EXP" : 20, "CONTACT_DETAILS_ID" : 1, "JOB_HISTORY_JOIN" : [ { "_id" : ObjectId("5e784b7a731625fa6d09b457"), "COMPANY_NAME" : "ABC Inc.", "START_DATE" : ISODate("2010-01-01T00:00:00Z"), "END_DATE" : ISODate("2020-03-23T05:39:06.845Z"), "CANDIDATE_ID" : 1 } ], "CONTACT_DETAILS_JOIN" : [ { "_id" : ObjectId("5e785100731625fa6d09b458"), "ID" : 1, "EMAIL" : "abc@g.com", "MOBILE" : "+11234567890" } ] } |
Neo4j
Below query is executed in Neo4J Browser.
1 2 3 |
match(c:CANDIDATE {ID: 1}) // Match nodes with label CANDIDATE & get node with ID 1 -[:CONTACTABLE_AT|WORKED_AT]->(cj) // Get the all nodes related to matched node with relationship label as CONTACTABLE_AT or WORKED_AT (refered as cj variable) return * |
This is a graphical representation of results which is rendered by Neo4J Browser.
Modify/Alter Data Structure
MySQL
1 2 3 4 |
mysql> ALTER TABLE CANDIDATE -- Modify structure of table CANDIDATE ADD COLUMN SKILL VARCHAR(1000); -- Add new column Skill Query OK, 0 rows affected (0.31 sec) Records: 0 Duplicates: 0 Warnings: 0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
INSERT INTO CANDIDATE -- Insert new row with the data in newly added column ( ID, NAME, YEARS_OF_EXP,SKILL, CONTACT_DETAILS_ID) VALUES (2, 'Tom', 25, 'Java', null ); mysql> select * from CANDIDATE; -- Fetch all CANDIDATE data & observer data +----+-------+--------------+--------------------+-------+ | ID | NAME | YEARS_OF_EXP | CONTACT_DETAILS_ID | SKILL | +----+-------+--------------+--------------------+-------+ | 1 | Jimmy | 20 | 1 | NULL | | 2 | Tom | 25 | NULL | Java | +----+-------+--------------+--------------------+-------+ 2 rows in set (0.00 sec) |
MongoDB
MongoDB doesn’t have structure as it is schemaless. So no need of any alter. We can directly insert new row with any additional field.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
> db.CANDIDATE_EMB.insertOne( //Insert new data with new "Skill" field ... { ... ID: 2, ... NAME: "Tom", ... YEARS_OF_EXP: 25, ... SKILL: "Java" ... }); > db.CANDIDATE_EMB.find( {} ).pretty(); // Fetch all CANDIDATE_EMB & observer data. { "_id" : ObjectId("5e7846ca731625fa6d09b454"), "ID" : 1, "NAME" : "Jimmy", "YEARS_OF_EXP" : 20, "CONTACT_DETAILS" : { "EMAIL" : "abc@g.com", "MOBILE" : "+11234567890" }, "JOB_HISTORY" : [ { "COMPANY_NAME" : "ABC Inc.", "START_DATE" : ISODate("2010-01-01T00:00:00Z"), "END_DATE" : ISODate("2020-03-23T05:19:06.750Z") } ] } { "_id" : ObjectId("5e7c2e8baecacc3833e9a967"), "ID" : 2, "NAME" : "Tom", "YEARS_OF_EXP" : 25, "SKILL" : "Java" } > |
Neo4j
Neo4j data are nodes so all nods are free to have any attributes. So no need of any alter & we can directly add new node with new fields.
1 2 3 4 5 6 |
CREATE (c:CANDIDATE { // Insert new node with label CANDIDATE with new attribute "Skill" ID: 2, NAME: "Tom", YEARS_OF_EXP: 25, SKILL: "Java" }) |
1 2 3 4 5 6 7 8 9 10 11 |
{ "YEARS_OF_EXP": 20, "ID": 1, "NAME": "Jimmy" } { "YEARS_OF_EXP": 25, "ID": 2, "SKILL": "Java", "NAME": "Tom" } |
Sorting & limiting results (Top results)
In earlier section, we inserted new data with YEARS_OF_EXP as 25 which was top max YEARS_OF_EXP.
MySQL
1 2 3 4 5 6 7 8 |
mysql> SELECT YEARS_OF_EXP FROM CANDIDATE -- Search all CANDIDATE data ORDER BY YEARS_OF_EXP DESC -- Order searched data in descending order LIMIT 1; -- Only return top 1 data. +--------------+ | YEARS_OF_EXP | +--------------+ | 25 | +--------------+ |
MongoDB
Reference for projection.
1 2 3 4 |
> db.CANDIDATE_EMB.find({},{YEARS_OF_EXP:1}) // First {} is query i.e. find all & second curly brace is projection i.e. find docs with YEARS_OF_EXP existing. .sort({YEARS_OF_EXP:-1}) // Sort using field YEARS_OF_EXP. -1 is for descending. .limit(1) // Limit results to only top 1. { "_id" : ObjectId("5e7c2e8baecacc3833e9a967"), "YEARS_OF_EXP" : 25 } |
Neo4j
1 2 3 4 5 6 7 8 9 10 |
match(c:CANDIDATE {}) // Match all nodes with label CANDIDATE & refer with variable c RETURN c // return c ORDER BY c.YEARS_OF_EXP DESC // Order result c by YEARS_OF_EXP in descending order LIMIT 1 // Limit result to only top 1. ╒══════════════════════════════════════════════════════╕ │"c" │ ╞══════════════════════════════════════════════════════╡ │{"YEARS_OF_EXP":25,"ID":2,"SKILL":"Java","NAME":"Tom"}│ └──────────────────────────────────────────────────────┘ |
Aggregate function (Max)
MySQL
1 2 3 4 5 6 |
mysql> SELECT MAX(YEARS_OF_EXP) FROM CANDIDATE; -- Get data from CANDIDATE with maximum YEARS_OF_EXP +-------------------+ | MAX(YEARS_OF_EXP) | +-------------------+ | 25 | +-------------------+ |
MongoDB
1 2 3 4 5 6 7 8 9 |
> db.CANDIDATE_EMB.aggregate([ // Get all documents from CANDIDATE_EMB { $group: { _id: null, maxQuantity: {$max: "$YEARS_OF_EXP"} // Aggregate documents & get document with max YEARS_OF_EXP } } ]) { "_id" : null, "maxQuantity" : 25 } |
Neo4j
1 2 3 4 5 6 7 8 9 10 11 |
match(c:CANDIDATE {}) // Match all nodes with label CANDIDATE WITH max(c.YEARS_OF_EXP) AS maximum // Match nodes with maximum YEARS_OF_EXP MATCH (c:CANDIDATE {}) // This is optional. Again match all nodes with label CANDIDATE.... WHERE c.YEARS_OF_EXP = maximum // ... where YEARS_OF_EXP is max RETURN c // Return that entire candidate ╒══════════════════════════════════════════════════════╕ │"c" │ ╞══════════════════════════════════════════════════════╡ │{"YEARS_OF_EXP":25,"ID":2,"SKILL":"Java","NAME":"Tom"}│ └──────────────────────────────────────────────────────┘ |
References
- MongoDB Community Server Install
- While installation on Windows 10, it gives error “Invalid domain/user/password”. Use domain as “.” & your Windows machine’s user name & password & then it works. Then keep continuing with default options on all installation screens.
- CLI – Run C:\Program Files\MongoDB\Server\4.2\bin\mongo.exe
- Neo4j Desktop Installation
- Download Neo4j desktop which is for local development setup. It asks for few details like name, email etc. & assigns a free key which can be used while installation.
- Neo4j query language is called Cypher.
nice article. expecting more articles like this. when to use what type db with more depth about volume of data, TPS etc.