Half-Terabyte Benchmark Neo4j vs. TigerGraph

Neo4j is ranked as the top graph database by DB-engine, and recently, Strata Data awarded the Most Disruptive Startup award to TigerGraph. Let's see how they compare.

Amanda Shen

Updated Oct. 01, 18 · Analysis

Likes (11)

Comment

Save

21.6K Views

Graph database having been becoming more and more popular and are getting lots of attention.

In order to know how graph databases perform, I researched the state-of-the-art benchmarks and found that loading speed, loaded data storage, query performance, and scalability are the common benchmark features. However, those benchmarks' testing datasets are too small, ranging from 4MB to 30 GB. So, I decided to do my own benchmark. Let's play with a huge dataset: half-terabytes.

Due to the difficulty in finding such huge graph dataset from the internet, I generated my own testing dataset, which is a mimic of daily phone call records. Here is a sample:

caller	callee	countryCode
1497410818	1349791947	11111

Since Neo4j is ranked as the top graph database by DB-engine, I am curious about its performance. And recently, Strata Data awarded the award Most Disruptive Startup to TigerGraph. Let's see how TigerGraph differs.

Test Setup

Hardware

I used a Amazon EC2 machine.

EC2 type	IOPS (SSD)	CPUs	Memory	Volume type	OS	Disk size
r4.4xlarge	32000	16	122 GiB	io1	ubuntu 14	3 TB

Software

I used the latest downloadable versions of both database systems:

TigerGraph Developer Edition
Neo4j 3.4.7 Community Edition

Dataset

The phone call edge files consist of 21 files; each file is around 24GB. The total size of the datasets is 501GB.

Name	Vertices #	Edges #
phoneCall	500,000,000	19,186,683,044

Description of Tests

The goal of the benchmark is to measure the performance of each database system when there is not enough memory to hold the whole dataset. To be able to measure this, I chose EC2 r4.4xlarge (144 GB) as the server. To my surprise, TigerGraph compresses the raw data to 14% of its original size, and fits memory perfectly.

The following test cases have been included:

Data loading: Bulk loading method supported by each database system.

	Neo4j-Cypher	TigerGraph
Built-in loading language	YES	YES
Requires separate vertex file	YES	NO
Incremental data loading	YES	YES
Index build during loading	NO	YES
Vertex ID deduplication	YES	YES

Storage size: Storage size of the loaded datasets.
k-hops query performance: I search for distinct, directed neighbors starting from six randomly selected vertices, returning total counts for discovered neighbors.

1-hop 3-hops 6-hops

query timeout 180 s 9000 s 9000 s
Page rank: Traverses every edge during each iteration. I chose ten iterations for page rank and run three times to calculate the average execution time. In this test, I set the timeout to 24 hours.

	1-hop	3-hops	6-hops
query timeout	180 s	9000 s	9000 s

Overall Results

Loading Time

Neo4j required extra time to build the index and extract the vertex file from edge file. In my test Neo4j took extra 8.7 hours to prepare the node file.

	TigerGraph	Neo4j
Load time	13.46 h	7.479 h (7h 28m 44s 571ms)
Index build	-	0.819 h (49 m 8 s)
Total	13.46 h	8.298 h

Storage Size After Loading

	TigerGraph	Neo4j
size	74.095 GB	1.4 TB

K-Hops-Neighbors Query Performance

	1-hop	3-hops	6-hops
TigerGraph	6.093 ms	0.053 s	433.796 s
Neo4j	151.015 ms	95.847 s	all out-of-memory

Page Rank Query Performance

	AVG time
TigerGraph	3.07 hours
Neo4j	cannot complete within 24 hours

Conclusion

Neo4j's loading time is shorter than TigerGraph; however, Neo4j requires extra preprocessing that extracts the vertex file from the edge file. After including the pre-processing time, Neo4j takes longer time to loading than TigerGraph.
TigerGraph can effectively compresses the data size and needs 19.3x less storage space than Neo4j's.
On the one-hop path query, TigerGraph is 24.8x faster than Neo4j.
On the three-hops path query, TigerGraph is 1808.43x faster than Neo4j.
TigerGraph can completed six-hops path query without pressure; the Neo4j query process was killed by OS out-of-memory killer after two hours.
Neo4j cannot complete page rank query within one day.

Neo4j Database

Opinions expressed by DZone contributors are their own.

Related

Trending