Big Data - Turning Point: Book KTA (Key Take Away): Graph Databases – NEW OPPORTUNITIES FOR CONNECTED DATA

Book name: Graph Databases – NEW OPPORTUNITIES FOR CONNECTED DATA

Authors – Ian Robinson, Jim Webber and Emil Eifrem

Publisher – O’REILLY MEDIA

Book can be downloaded for free from here - http://neo4j.com/books/

Chapter 6 is about Graph Database Internals which discuss the implementation of graph databases. It considers most common architectures and Neo4j graph database architecture for discussion.

Native Graph Processing - A database engine that utilizes index-free adjacency is one in which each node maintains direct references to its adjacent nodes. Each node, therefore, acts as a micro-index of other nearby nodes, which is much cheaper than using global indexes. A nonnative graph database engine, in contrast, uses (global) indexes to link nodes together. Also there is good explanation on how Index-Free Adjacency Leads to Low-Cost “Joins”.

Native Graph Storage – Neo4j stores graph data in a number of different store files. Each store file contains the data for a specific part of the graph (e.g. , there are separate stores for nodes, relationships, labels, and properties). Then it explains Neo4j node and relationship store file record structure in detail.

Programmatic APIs – Following the APIs are discussed:

Kernel API: These allow user code to listen to transactions as they flow through the kernel, and thereafter to react (or not) based on the data content and lifecycle stage of the transaction.
Core API: This is an imperative Java API that exposes the graph primitives of nodes, relationships, properties, and labels to the user. When used for reads, the API is lazily evaluated, meaning that relationships are only traversed as and when the calling code demands the next node.
Traversal Framework: A declarative Java API which enables the user to specify a set of constraints that limit the parts of the graph the traversal is allowed to visit.

In next section, following Nonfunctional Characteristics are discussed in detail:

Transactions (How transactions are implemented in Neo4j)
Recoverability
Availability (Replication)
Scale – Capacity (graph size), Latency (response time), Read and Write throughput.

Chapter 7 is Predictive Analysis with Graph Theory which examine some analytical techniques and algorithms for processing graph data.

Following search/path finding algorithms are explained in brief:

Depth- and Breadth- First Search
Path-Finding with Dijkstra’s Algorithm
The A* (A-star) Algorithm

In next section, Graph Theory and Predictive Modeling is explained with following points:

Triadic Closures – A triadic closure is a common property of social graphs, where we observe that if two nodes are connected via a path involving a third node, there is an increased likelihood that the two nodes will become directly connected at some point in the future.
Structural Balance – Relationship balance between nodes of a graph.
Local Bridges – A connection between two sub-graphs.

Book also has Appendix which gives NOSQL Overview. Readers new to NOSQL, should read this overview first for better understanding of the book.

Big Data - Turning Point

Saturday, January 9, 2016

Book KTA (Key Take Away): Graph Databases – NEW OPPORTUNITIES FOR CONNECTED DATA - Part 3

No comments:

Post a Comment