Neo4j Graph Embedding Graph Data Science

Graph Embeddings in Neo4j

洪堂瑋 Tangwei Hung 2021/12/09 09:01:33

Table Of Contents


What's an embedding

Graph Embedding

Graph Embeddings in Neo4j

        Fast Random Projections (FastRP)



Graph Embeddings for News Recommendation dataset



Graph Embedding Algorithm 是 Neo4j 中的亮點。


Graph Embedding非常強大,因為它們保留了圖的關鍵特徵,同時以可解碼的方式降低了維數。 這意味著您可以捕獲圖形的複雜性和結構,並將其轉換為用於各種 ML 預測。

本文章將會簡單介紹Graph Embedding,並且實作Graph Embedding在Neo4j上(使用MIND News Dataset)


What's an embedding

- Google: "An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors"

- Wikipedia: "In mathematics, an embedding is one instance of some mathematical structure contained within another instance, such as a group that is a subgroup.


我們可以把Embedding 想像成:一種將某物(文檔、圖像、圖表)映射到固定長度向量(或矩陣)的方法,該向量在降低維數的同時捕獲關鍵特性。





你可以想像這是一隻小熊維尼,我們把這個小熊維尼用portrait的方式從三維空間轉換成二維度空間。這個Portrait embeds 小熊維尼到二維度空間表示。





- 應該是一個簡化的表示,但是

- 表示仍應可識別

A portrait embeds a three-dimensional person into two-dimensional space.

- Should be a simplified representation, but

- The representation should still be recognizable


Graph Embedding



A graph embedding translates your graph into fixed-length vectors.

To be useful your embedding must: preserve key featuresreduce dimensionality, and may be decoded to reconstruct your graph.



- 節點(頂點)嵌入: 描述每個節點的連通性

- 邊緣嵌入: 每個關係和它所連接的節點

- 路徑嵌入: 跨圖的遍歷

- 子圖嵌入: 編碼圖的一部分,像一個集群

- 圖嵌入: 將整個圖編碼為單個向量

Graph embeddings can encode different aspects of your graph:

- Node (vertex) embeddings: describe the connectivity of each node

- Edge embeddings: each relationship and the nodes it connects

- Path embeddings: traversals across the graph

- Subgraph embedding: encode a part of the graph, like a cluster

- Graph embeddings: encode an entire graph into a single vector


Why need a graph embedding?



Graph embeddings translate your graph into meaningful numbers that you can use as inputs for machine learning algorithms, simplified visualizations, or similarity measurements.


When to use graph embedding?

Intent            Real-time local matching            Find global patterns              Create a highly predictive
                            for specific questions            and trends for analytics          numerical representation for
                                                                                                                          machine learning


Use When     You know the exact pattern         You know the kind of            You know there's something
                               you're looking for                patterns you're looking for        important in the graph but
                                                                                                                           you don't know exactly what
                                                                                                                                     to look for


Embeddings in Neo4j


Neo4j offers three types of node embeddings, which can translate each node in your graph into a faithful numerical representation.



Fast Random Projections (FastRP)

Supported algorithm traits:


FastRP通過基本線性代數實現 (Johnsson-Lindenstrauss Lemma)


The Random Projection embedding uses sparse random projections to generate embeddings. It is an implementation of the FastRP algorithm. It is the fastest of the embedding algorithms and can therefore be useful for obtaining baseline embeddings. The embeddings it generates are often equally performant as more complex algorithms that take longer to run.

FastRP works via basic linear algebra (Johnsson-Lindenstrauss Lemma)


At each iteration, the algorithm constructs an intermediate embedding by averaging the intermediate embeddings of neighbors from the last iteration or the initial random vector.



- 屬性由featureProperties指定,其影響由property Dimension確定。

- 節點嵌入是通過拓撲嵌入+屬性嵌入來構建的

- 屬性以與圖結構相同的方式嵌入:用一個隨機向量初始化節點屬性,然後在每次迭代中,平均相鄰節點的屬性。

FastRP can encode node properties and wights as well

- Properties are specified with featureProperties and their influence is determined with property Dimension.

- Node embedding are constructed by concatenating a topological embedding + a property embedding

- Properties are embedded in the same way as graph structure: node properties are initialized with a random vector, and then at each iteration, averaged across the properties of neighboring nodes.




Node2Vec借用了word embedding技術: 使用隨機漫步採樣上下文windows,並訓練skipgram模型。


node2Vec computes embeddings based on biased random walks of a node’s neighborhood. The algorithm trains a single-layer feedforward neural network, which is used to predict the likelihood that a node will occur in a walk based on the occurrence of another node. node2Vec has parameters that can be tuned to control whether the random walks behave more like breadth first or depth first searches. This tuning allows the embedding to either capture homophily (similar embeddings capture network communities) or structural equivalence (similar embeddings capture similar structural roles of nodes).

Node2Vec borrows from word embedding techniques: using random walks to sample context windows, and training a skipgram model.


Node2Vec is cool because it can learn either homophily (neighboring node) or structural embeddings.



GraphSAGE 是唯一支持節點屬性的算法。包含節點屬性的訓練嵌入對於包含圖拓撲之外的信息(如元數據、屬性或其他圖算法的結果)很有用。GraphSAGE與其他算法的不同之處在於,它學習一個函數來計算嵌入,而不是訓練每個節點的單個嵌入。



GraphSAGE is the only one that supports node properties. Training embeddings that include node properties can be useful for including information beyond the topology of the graph, like meta data, attributes, or the results of other graph algorithms. GraphSAGE differs from the other algorithms in that it learns a function to calculate an embedding rather than training individual embeddings for each node.

GraphSAGE learns representations of nodes and their properties and returns an inductive model: this means you can generate embeddings for new, previously unseen data.



Graph Embeddings for News Recommendations Dataset


Data Set:

Almost 100k users and their MSN news click behaviors. Made publically available by Microsoft.




- Neo4j: 4.4.0

- APOC library: 4.4.0

- Graph Data Science (GDS) Library = 1.8.0 



1. Create Graph Projection


2. Run FastRP and Write back to graph database


3. See the FastRP result



1. Create Graph Projection


2. Node2Vec Embedding and Write back to graph database


3. Show results



1. Create Graph Projection


2. Train GraphSage


3. Write graphSage embeddings


4. Show results


洪堂瑋 Tangwei Hung