The following text is better displayed and more accurate in the PDF file : http://romain.raveaux.free.fr/document/AroundGraphMatchingRelaxedAndRefined.pdf
The objective of graph matching is to find correspondences between two attributed graphs $G_1$ and $G_2$. A solution of graph matching is defined as a subset of possible correspondences $\mathcal{Y} \subseteq V_1 \times V_2$, represented by a binary assignment matrix $Y \in \{0,1 \}^{n1 \times n2}$, where $n1$ and $n2$ denote the number of nodes in $G_1$ and $G_2$, respectively. If $u_i \in V_1$ matches $v_k \in V_2$, then $Y_{i,k}=1$, and $Y_{i,k}=0$ otherwise. We denote by $y\in \{0,1 \}^{n1 . n2}$, a column-wise vectorized replica of $Y$. With this notation, graph matching problems can be expressed as finding the assignment vector $y^*$ that maximizes a score function $S(G_1, G_2, y)$ as follows:
\begin{align} y^* =& \underset{y} {\mathrm{argmax}} \quad S(G_1, G_2, y)\\ \text{subject to}\quad & y \in \{0, 1\}^{n1.n2} \\ &\sum_{i=1}^{n1} y_{i,k} \leq 1 \quad \forall k \in [1, \cdots, n2]\\ &\sum_{k=1}^{n2} y_{i,k} \leq 1 \quad \forall i \in [1, \cdots, n1] \end{align}
where equations \eqref{eq:subgmgmc},\eqref{eq:subgmgmd} induces the matching constraints, thus making $y$ an assignment vector.
The function $S(G_1, G_2, y)$ measures the similarity of graph attributes, and is typically decomposed into a first order dissimilarity function $s(u_i,v_k)$ for a node pair $u_{i} \in V_1$ and $v_k \in V_2$, and a second-order similarity function $s(e_{ij},e_{kl})$ for an edge pair $e_{ij} \in E_1$ and $e_{kl} \in E_2$. Thus, the objective function of graph matching is defined as: \begin{equation} \begin{aligned} S(G_1,G_2,y) =& \sum_{i=1}^{n1} \sum_{k=1}^{n2} s(u_i,v_k) \cdot y_{i,k} + \sum_{i=1}^{n1} \sum_{j=1}^{n1} \sum_{k=1}^{n2} \sum_{l=1}^{n2} s(e_{ij},e_{kl})\cdot y_{ik} \cdot y_{jl}\\ =& \sum_{i=1}^{n1} \sum_{j=1}^{n1} \sum_{k=1}^{n2} \sum_{l=1}^{n2} K_{ik,jl}\cdot y_{ik} \cdot y_{jl}\\ \quad \quad \quad \quad =&y^TKy \end{aligned} \end{equation}
Similarity functions are usually represented by a symmetric similarity matrix $K \in \mathbb{R}^{n1n2 \times n1n2}$. A non-diagonal element $K_{ik,jl} = s(e_{ij},e_{kl})$ contains the edge similarity and a diagonal term $K_{ik,ik} = s(u_i,v_k)$ represents the vertex similarity. $K$ is called an affinity matrix or compatibility matrix. In this representation, $K_{ij,kl}=0$ means an impossible matching or a very dissimilar matching. In essence, the score accumulates all the similarity values that are relevant to the assignment. The formulation in Problem \ref{model:iqpsubgm} is referred to as an integer quadratic programming. This integer quadratic programming expresses the quadratic assignment problem, which is known to be NP-hard.
The NP-hard graph matching problem is relaxed by dropping both the binary and the mapping constraints. The model to be solved is then:
\begin{equation} \begin{aligned} y^* =& \underset{y} {\mathrm{argmax}} \quad y^TKy \end{aligned} \end{equation} \begin{align} \text{subject to}\quad & y \in [0,1]^ {\vert V_1 \vert \cdot \vert V_2 \vert }\\ %\label{eq:subgmgmc} &\Vert y \Vert_2 =1 % & \sqrt{\sum_{i=1}^{\vert V_1 \vert} \sum_{k=1}^{\vert V_2 \vert} (y_{i,k})^2} = 1 \end{align}
The optimal $y^*$ is then given by the leading eigenvector of the matrix $K$. See the PDF file fore more details (http://romain.raveaux.free.fr/document/AroundGraphMatchingRelaxedAndRefined.pdf).
First, we need to find the eigenvalues $\lambda$ such that : \begin{equation} P=\vert K - \lambda I \vert \end{equation} Setting the polynomial (P) equal to zero: $P=0$. The polynomial has roots at $\lambda_i \quad \forall i \in [1, \cdots, n1.n2]$. $\lambda_i$ are the two eigenvalues of K. The eigenvectors corresponding to each eigenvalue can be found by solving for the components of v in the equation: \begin{equation} (K - \lambda I)m_i=0 \quad \forall i \in [1, \cdots, n1.n2] \end{equation} The eigenvector associated to the highest eigenvalue is the solution of the Problem \ref{model:relaxedQAPK}. \begin{equation} k^*= arg \max_{k \in [1, \cdots, n1.n2 ] } \lambda_k \end{equation}
\begin{equation} y^* = m_{k^*} \end{equation}
Computing the leading eigenvector $y^*$ of the affinity matrix $K$ can be done using power iterations. \begin{equation} m^{(k+1)}=\frac{K m^{(k)}}{\Vert K m^{(k)} \Vert_2} \end{equation} $m_0 =1$ is initialize to 1. $N$ iterations are run to output the vector $y^* =m^{(N)}$.
The time complexity of this algorithm per power iteration is $O(n1^2.n2^2)$ when the matrix K is dense.
The solution $y^*$ of the relaxed graph matching problem (Problem \ref{model:relaxedQAPK}) is reshaped to become a matrix $Y^* \in \mathbb{R}^{n1 \times n2}$. Then, the matrix $Y^*$ is modified to become a doubly stochastic matrix. In the graph matching context, this modification represents the adding of the one-to-one mapping constraints (L1 constraints) $\forall k, \sum_{i} y_{ik} = 1$ and $\forall i, \sum_{k} y_{ik} = 1$. The transformation of the matrix $Y^*$ to a doubly stochastic matrix is performed by the Sinkhorn-Knopp algorithm: Starting with $M^{(0)}=Y^*$. $$M^{(k+1)}=M_k[1_{n1}^TM^{(k)}]^{-1}$$ $$M^{(k+2)}=[M^{(k+1)}1_{n2}]^{-1}M^{(k+1)}$$ $$M^{(k+1)}_{i,j}=\frac{M^{(k)}_{i,j}}{\sum_j M^{(k)}_{i,j}} \quad \forall i \in [1, \cdots, n1]$$ $$M^{(k+2)}_{i,j}=\frac{M^{(k+1)}_{i,j}}{\sum_i M^{(k+1)}_{i,j}} \quad \forall j \in [1, \cdots, n2]$$ With : $1_{n1} \in {1}^{n1 \times 1}$ and $1_{n2} \in {1}^{n2 \times 1}$.
See the PDF file fore more details (http://romain.raveaux.free.fr/document/AroundGraphMatchingRelaxedAndRefined.pdf).
Graphs that represent distorted letter drawings. They consider the 15 capital letters of the Roman alphabet that consist of straight lines only (A, E, F, H, I, K, L, M, N, T, V, W, X, Y, Z). Each node is labeled with a two-dimensional attribute giving its position relative to a reference coordinate system. Edges are unlabeled. The graph database consists of a training set, a validation set, and a test set of size 750 each. Also, three levels of distortions are provided.
This dataset is part of IAM Graph Database Repository and it is also linked in the IAPR TC15 resources.
It can be considered as a TOY EXAMPLE for graph classification.
Riesen, K. and Bunke, H.: IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning. In: da Vitora Lobo, N. et al. (Eds.), SSPR&SPR 2008, LNCS, vol. 5342, pp. 287-297, 2008.
!wget https://iapr-tc15.greyc.fr/IAM/Letter.zip
!unzip Letter.zip
IAM graphs are provided as a GXL file:
<gxl>
<graph id="GRAPH_ID" edgeids="false" edgemode="undirected">
<node id="_0">
<attr name="x">
<float>0.812867</float>
</attr>
<attr name="y">
<float>0.630453</float>
</attr>
</node>
...
<node id="_N">
...
</node>
<edge from="_0" to="_1"/>
...
<edge from="_M" to="_N"/>
</graph>
</gxl>
import numpy as np
import xml.etree.ElementTree as ET
import networkx as nx
import os
import matplotlib.pyplot as plt
%matplotlib inline
def read_letters(file):
"""Parse GXL file and returns a networkx graph
"""
tree_gxl = ET.parse(file)
root_gxl = tree_gxl.getroot()
node_label = {}
node_id = []
# Parse nodes
for i, node in enumerate(root_gxl.iter('node')):
node_id += [node.get('id')]
for attr in node.iter('attr'):
if (attr.get('name') == 'x'):
x = float(attr.find('float').text)
elif (attr.get('name') == 'y'):
y = float(attr.find('float').text)
node_label[i] = [x, y]
node_id = np.array(node_id)
# Create adjacency matrix
am = np.zeros((len(node_id), len(node_id)))
for edge in root_gxl.iter('edge'):
s = np.where(node_id==edge.get('from'))[0][0]
t = np.where(node_id==edge.get('to'))[0][0]
# Undirected Graph
am[s,t] = 1
am[t,s] = 1
# Create the networkx graph
G = nx.from_numpy_matrix(am)
nx.set_node_attributes(G, node_label, 'position')
return G
# Select distortion [LOW, MED, HIGH]
distortion = 'LOW'
# Select letter [A, E, F, H, I, K, L, M, N, T, V, W, X, Y, Z]
letter = 'K'
# Select id [0-149]
id=50
# Read the graph and draw it using networkx tools
G1 = read_letters(os.path.join('Letter', distortion, letter+'P1_'+ str(id).zfill(4) +'.gxl'))
print(G1.nodes)
nx.draw(G1, pos=dict(G1.nodes(data='position')))
plt.show()
id=70#52#100
# Read the graph and draw it using networkx tools
G2 = read_letters(os.path.join('Letter', distortion, letter+'P1_'+ str(id).zfill(4) +'.gxl'))
print(G2.nodes)
nx.draw(G2, pos=dict(G2.nodes(data='position')))
plt.show()
n1=G1.number_of_nodes()
G1G2=nx.disjoint_union(G1,G2)
nx.draw(G1G2, pos=dict(G1G2.nodes(data='position')))
plt.show()
for i in range(n1,G1G2.number_of_nodes()):
G1G2.node[i]['position']=[G1G2.node[i]['position'][0]+5,G1G2.node[i]['position'][1]]
nx.draw(G1G2, pos=dict(G1G2.nodes(data='position')))
plt.show()
def ExtractNodeFeaturesAndAdjacencyMatrix(G):
nodelist, nodes = map(list, zip(*G.nodes(data='position')))
GnodesFeatures = np.array(nodes)
Gadjacency = np.array(nx.adjacency_matrix(G, nodelist=nodelist).todense())
return GnodesFeatures,Gadjacency
print("Printing the first node")
print(G1.nodes[0])
print("Printing the edges : they have no attributes")
print(G1.edges)
print("Printing the nodes attributes")
print(G1.nodes(data='position'))
G1nodesFeatures,G1adjacency=ExtractNodeFeaturesAndAdjacencyMatrix(G1)
print(G1nodesFeatures.shape)
print(G1nodesFeatures)
print(G1adjacency)
G2nodesFeatures,G2adjacency=ExtractNodeFeaturesAndAdjacencyMatrix(G2)
Let us see if we use the scalar product as a similarity measure.
print(G1nodesFeatures.shape)
print(G2nodesFeatures.shape)
NodeSimilarity=G1nodesFeatures.dot(G2nodesFeatures.T)
print(NodeSimilarity.shape)
print(NodeSimilarity)
print(NodeSimilarity.argmax(axis=1))
plt.imshow(NodeSimilarity,cmap="gray")
Scalar product is not very convenient to match nodes. Let us try an Eucliedean distance.
def distance_matrix_np(set1,set2):
dm=np.zeros((set1.shape[0],set2.shape[0]))
for i in range(set1.shape[0]):
for j in range(set2.shape[0]):
dm[i,j]=np.linalg.norm((set1[i,:]-set2[j,:]),2)
return dm
"""Returns matrix of pairwise Euclidean distances. Vectorized numpy version."""
#return np.sum((set1[None,:] - set2[:, None])**2, -1)**0.5
NodeDisSimilarity=distance_matrix_np(G1nodesFeatures,G2nodesFeatures)
print(NodeDisSimilarity.shape)
print(NodeDisSimilarity)
print(NodeDisSimilarity.argmin(axis=1))
plt.imshow(NodeDisSimilarity,cmap="gray")
The Eucleudean distance is quite efficient to discriminate between nodes. Let us chance the dissimilarity into a similarity. s = cst - d
cst=10
NodeSimilarity=cst-NodeDisSimilarity
print(NodeSimilarity)
print(NodeSimilarity.argmax(axis=1))
plt.imshow(NodeSimilarity,cmap="gray")
def BuildKv1(G1nodesFeatures,G2nodesFeatures,G1adjacency,G2adjacency):
n1=G1nodesFeatures.shape[0]
n2=G2nodesFeatures.shape[0]
cst=10
K = np.zeros((n1*n2,n1*n2))
print(K.shape)
ik=-1
for i in range(n1):
for k in range(n2):
ik+=1
jl=-1
for j in range(n1):
for l in range(n2):
jl+=1
if i==j and k==l :
feature1 = G1nodesFeatures[i]
feature2 = G2nodesFeatures[k]
#sim=feature1.dot(feature2.T)
d=np.linalg.norm((feature1-feature2),2)
K[ik,jl]=cst-d
else:
e1=G1adjacency[i,j]
e2=G2adjacency[k,l]
if e1==e2:
K[ik,jl]=cst
else:
K[ik,jl]=0
return K
K=BuildKv1(G1nodesFeatures,G2nodesFeatures,G1adjacency,G2adjacency)
print(K)
plt.imshow(K,cmap="gray")
A more complex/richer version of K can be computed. exp(-d) : Exponentional function (between ]-inf,0]) of minus a distance is similarity function between 0 and 1.
def BuildKv2(G1nodesFeatures,G2nodesFeatures,G1adjacency,G2adjacency):
n1=G1nodesFeatures.shape[0]
n2=G2nodesFeatures.shape[0]
K = np.zeros((n1*n2,n1*n2))
ik=-1
for i in range(n1):
for k in range(n2):
ik+=1
jl=-1
for j in range(n1):
for l in range(n2):
jl+=1
if i==j and k==l :
feature1 = G1nodesFeatures[i]
feature2 = G2nodesFeatures[k]
d=np.linalg.norm(feature1-feature2,2)
d1=np.linalg.norm(feature1,2)
d2=np.linalg.norm(feature2,2)
dnorm=d/np.max([d1,d2])#(np.linalg.norm(feature1,2)+np.linalg.norm(feature2,2))
#sim=feature1.dot(feature2.T)
K[ik,jl]=np.exp(-dnorm)
else:
e1=G1adjacency[i,j]
e2=G2adjacency[k,l]
if e1==e2:
feature1i = G1nodesFeatures[i]
feature1j = G1nodesFeatures[j]
dij= np.linalg.norm(feature1i-feature1j,2)
feature2k = G2nodesFeatures[k]
feature2l = G2nodesFeatures[l]
dkl= np.linalg.norm(feature2k-feature2l,2)
d=np.sqrt((dij-dkl)**2)
dnorm=d/np.max([dij,dkl])#(dij+dkl)
K[ik,jl]=0.125*np.exp(-dnorm) #np.abs(dij-dkl)
else:
K[ik,jl]=0
#print(K)
return K
K=BuildKv2(G1nodesFeatures,G2nodesFeatures,G1adjacency,G2adjacency)
print(K)
plt.imshow(K,cmap="gray")
def GraphMatchingEigenVector(K,n1,n2):
eigval,eigvec = np.linalg.eig(K)
print('Printing the eigenvalues shape')
print(eigval.shape)
print('Printing the eigenvalues')
print(eigval)
print('Printing the eigenvectors shape')
print(eigvec.shape)
kmax=eigval.argmax()
print('Printing the index of the highest eigenvalues')
print(kmax)
#print(eigvec)
y=eigvec[:,kmax]
print('Printing the shape of the mapping vector')
print(y.shape)
Y=y.reshape((n1,n2))
Y=np.abs(Y)
print('Printing the node-to-node mapping matrix')
print(Y)
return y,Y
n1=G1nodesFeatures.shape[0]
n2=G2nodesFeatures.shape[0]
y,Y=GraphMatchingEigenVector(K,n1,n2)
print("Printing the best matching for each node of G1")
bestmatch = Y.argmax(axis=1)
print(bestmatch)
print("Computing the objective function")
res=y.T.dot(K)
obj=res.dot(y)
print("Printing the objective value. It should be equal to the highest eigenvalue")
print(obj)
print("Printing the norm of the mapping vector y. It should be equal to one")
print(np.linalg.norm(y,2))
G1G2=nx.disjoint_union(G1,G2)
#save=G1G2.edges()
#clone=G1G2.copy()
#clone.remove_edges_from(save)
#G1G2=clone.copy()
for i in range(n1,G1G2.number_of_nodes()):
G1G2.node[i]['position']=[G1G2.node[i]['position'][0]+5,G1G2.node[i]['position'][1]]
edgelista=[]
for i in range(n1):
k=bestmatch[i]
#G1G2.add_edge(i, k+n1, edge_color='r')
edgelista.append((i, k+n1))
nx.draw(G1G2, pos=dict(G1G2.nodes(data='position')))
nx.draw(G1G2, pos=dict(G1G2.nodes(data='position')),edgelist=edgelista,
width=1, alpha=0.5, edge_color='b')
plt.show()
def GraphMatchingPowerIteration(K,n1,n2,N):
m=np.ones((n1*n2))
for i in range(N):
num=K.dot(m)
denum=np.linalg.norm(num,2)
res=num/denum
m=res
y=m
Y=y.reshape((n1,n2))
return y,Y
N=100
y,Y=GraphMatchingPowerIteration(K,n1,n2,N)
print('Printing the node-to-node mapping matrix')
print(Y)
print("Printing the best matching for each node of G1")
print(Y.argmax(axis=1))
print("Computing the objective function")
res=y.T.dot(K)
obj=res.dot(y)
print("Printing the objective value. It should be equal to the highest eigenvalue")
print(obj)
print("Printing the norm of the mapping vector y. It should be equal to one")
print(np.linalg.norm(y,2))
def SinkhornKnopp(Y,N):
M=Y.copy()
for i in range(N):
res=M.sum(axis=1) #sum row onen1.T.dot(M)
Mk=np.divide(M,res[:,None])
res=Mk.sum(axis=0) #sum col
#Mk2=np.divide(Mk,res[:,None].T)
Mk2=np.divide(Mk,res[None,:])
M=Mk2
return M
N=10
M=SinkhornKnopp(Y,N)
print("Chekching if the columns sum to one")
print(M.sum(axis=0))
print("Chekching if the rows sum to one")
print(M.sum(axis=1))
print("Printing the best matching for each node of G1")
bestmatch = M.argmax(axis=1)
print(bestmatch)
G1G2=nx.disjoint_union(G1,G2)
for i in range(n1,G1G2.number_of_nodes()):
G1G2.node[i]['position']=[G1G2.node[i]['position'][0]+5,G1G2.node[i]['position'][1]]
edgelista=[]
for i in range(n1):
k=bestmatch[i]
edgelista.append((i, k+n1))
nx.draw(G1G2, pos=dict(G1G2.nodes(data='position')))
nx.draw(G1G2, pos=dict(G1G2.nodes(data='position')),edgelist=edgelista,
width=1, alpha=0.5, edge_color='b')
plt.show()
def draw(j):
ax.cla()
ax.axis('off')
ax.set_title('Matching Graph 0 and Graph: %d' % j)
G2 = read_letters(os.path.join('Letter', distortion, letter+'P1_'+ str(j).zfill(4) +'.gxl'))
G2nodesFeatures,G2adjacency=ExtractNodeFeaturesAndAdjacencyMatrix(G2)
K=BuildKv2(G1nodesFeatures,G2nodesFeatures,G1adjacency,G2adjacency)
N=100
n1=G1nodesFeatures.shape[0]
n2=G2nodesFeatures.shape[0]
y,Y=GraphMatchingPowerIteration(K,n1,n2,N)
N=10
M=SinkhornKnopp(Y,N)
bestmatch = M.argmax(axis=1)
G1G2=nx.disjoint_union(G1,G2)
for i in range(n1,G1G2.number_of_nodes()):
G1G2.node[i]['position']=[G1G2.node[i]['position'][0]+5,G1G2.node[i]['position'][1]]
edgelista=[]
for i in range(n1):
k=bestmatch[i]
edgelista.append((i, k+n1))
nx.draw(G1G2, pos=dict(G1G2.nodes(data='position')),ax=ax)
nx.draw(G1G2, pos=dict(G1G2.nodes(data='position')),edgelist=edgelista,
width=1, alpha=0.5, edge_color='b',ax=ax)
#plt.show()
# Select distortion [LOW, MED, HIGH]
distortion = 'LOW'
# Select letter [A, E, F, H, I, K, L, M, N, T, V, W, X, Y, Z]
letter = 'K'
# Select id [0-149]
id=0
# Read the graph and draw it using networkx tools
G1 = read_letters(os.path.join('Letter', distortion, letter+'P1_'+ str(id).zfill(4) +'.gxl'))
G1nodesFeatures,G1adjacency=ExtractNodeFeaturesAndAdjacencyMatrix(G1)
fig = plt.figure(dpi=150)
fig.clf()
ax = fig.subplots()
draw(0) # draw the prediction of the first epoch
plt.close()
ani = animation.FuncAnimation(fig, draw, frames=149, interval=1000)
ani.save('GraphMachingLetterKLOW.mp4')
HTML(ani.to_html5_video())
# Select distortion [LOW, MED, HIGH]
distortion = 'LOW'
# Select letter [A, E, F, H, I, K, L, M, N, T, V, W, X, Y, Z]
letter = 'K'
# Select id [0-149]
id=0
# Read the graph and draw it using networkx tools
G1 = read_letters(os.path.join('Letter', distortion, letter+'P1_'+ str(id).zfill(4) +'.gxl'))
G1nodesFeatures,G1adjacency=ExtractNodeFeaturesAndAdjacencyMatrix(G1)
for j in range(149):
G2 = read_letters(os.path.join('Letter', distortion, letter+'P1_'+ str(j).zfill(4) +'.gxl'))
G2nodesFeatures,G2adjacency=ExtractNodeFeaturesAndAdjacencyMatrix(G2)
K=BuildKv2(G1nodesFeatures,G2nodesFeatures,G1adjacency,G2adjacency)
N=100
n1=G1nodesFeatures.shape[0]
n2=G2nodesFeatures.shape[0]
y,Y=GraphMatchingPowerIteration(K,n1,n2,N)
N=10
M=SinkhornKnopp(Y,N)
bestmatch = M.argmax(axis=1)
G1G2=nx.disjoint_union(G1,G2)
for i in range(n1,G1G2.number_of_nodes()):
G1G2.node[i]['position']=[G1G2.node[i]['position'][0]+5,G1G2.node[i]['position'][1]]
edgelista=[]
for i in range(n1):
k=bestmatch[i]
edgelista.append((i, k+n1))
nx.draw(G1G2, pos=dict(G1G2.nodes(data='position')))
nx.draw(G1G2, pos=dict(G1G2.nodes(data='position')),edgelist=edgelista,
width=1, alpha=0.5, edge_color='b')
plt.savefig('./image ('+str(j)+') res.png')
plt.show()
listimages=[]
fig = plt.figure(dpi=300) # just for display
ax = fig.subplots()
ax.axis('off')
for j in range(149):
#ax.set_title('Graph 0 and Graph : %d' % j)
im=plt.imread('./image ('+str(j)+') res.png')
imdata=plt.imshow(im)
listimages.append([imdata])
from matplotlib import animation, rc
from IPython.display import HTML
ani = animation.ArtistAnimation(fig, listimages, interval=1000, blit=True,
repeat_delay=1000)
ani.save('GraphMachingLetterK.mp4')
plt.show()
HTML(ani.to_html5_video())