On Github kxk / multisection-presentation
Clustering, in general, is a basic primitive of statistics and machine learning.
It is a main task of a number of fields and a lot of work has been put into it.
Graph partitioning problems and especially bisection and multisection with their numerous variations are central to the study of data clustering.
Given a graph we are looking for a balanced bi-partition that minimizes the number of edges across the two partitions.
Given a graph we are looking for a balanced k-partition that minimizes the total number of edges across the k partitions.
Both of these problems are NP-Hard.
Even worse: no polynomial time approximation algorithm can guarantee a finite approximation ratio for either unless P=NP.
But do these bad instances really appear in practice?
In other words, what about the average case analysis for these problems?
The purpose of this talk will be to answer the above questions for these two problems and to demonstrate the techniques used.
Trying to answer similar questions following works of [BO87], Fiege and Kilian in [FK01] studied bisection under the following planted partition semirandom model:
We will refer to G_{rand} as a graph drawn randomly from the distribution \mathcal{G}_{n,p,q} defined by the first three bullets.
In this setting, the question is to return the initial planted bisection.
Why this model?
Average case analysis is equivalent to considering the probability of success of an algorithm on a random graph.
However choosing graphs too uniformly from the set of all possible graphs, is not very helpful. Only a small percentage of all n vertex graphs have meaningful bisections.
By restricting our attention to this model we are guaranteed graphs that are likely to have non-trivial solutions, are robust and present a realistic model for data sets studied in practice.
Let b(G) be the value of the minimum bisection of G: b(G)=\min_{\mathcal{S}}\left[ \left| E(S, \bar{S}) \right| \right]
The goal is to find a function h(G) such that h provides a heuristic in the sense that for many graphs:h(G)=b(G).
Recall that after we randomly pick a graph G_{rand}, the adversary can act on it.
Definition. ([FK01]) A function h is robust with respect to monotone adversaries if for every graph G_1 and every graph G_2 obtainable by a single monotone transformation of G_1, h(G_1)=b(G_1) implies h(G_2)=b(G_2).A monotone adversary is powerless against a robust function h; whenever G_{rand} is such that h(G_{rand})=b(G_{rand}) the monotone adversary cannot prevent h(G)=b(G).
then h is robust.
Proof. Let G_1 be an arbitrary graph for which h(G_1)=b(G_1). Let G_2 be a graph obtained by a single monotone transformation, using a (minimal) bisection (S, \bar{S}) of G_1 with b(G_1) edges. Need to show that h(G_2)=b(G_2).
Proof (cont'd). The inequality h(G_2)\leq b(G_2) follows directly from the lower bound property for h. It remains to show that b(G_2)\leq h(G_2). Suppose:
QED
To compute the bisection in this model, we would like to find a polytime computable function h that is robust, and in addition is "probably good":
h(G_{rand})=b(G_{rand})with high probabilityover the choice of G_{rand}.[FK01] proposed the following semidefinite relaxation of bisection.
Find an order n matrix X=\{x_{ij}\} satisfying:
\min_X h_X(G) := \sum_{(i,j) \, \in\, E} \frac{1 - x_{ij}}{2}\mbox{subject to } \:\:\:\:\: (\forall i \in V)\ x_{ii} = 1 \:\:\:\:\:\:\:\:\:\:\:\:\:\:\: \sum_{i, j \,\in\, V} x_{ij} = 0 \:\:\:\:\:\:\:\:\:\: X \ \succeq \ 0
Let h(G) be equal to the solution of the above semidefinite relaxation of bisection: h(G)=\min_X h_X(G).
By using semidefinite programming h(G) can be computed in polynomial time.
Proposition. ([FK01]) The function h is robust.Proof. Suffices to show that h satisfies the two properties: "lower bound" and "bounded monotonicity".
"lower bound": To see that h(G)\leq b(G), consider the indicator vector \mathbf{s}=(s_1, \ldots, s_n) \in \{-1,+1\}^n where: s_i = \left\{ \begin{array}{rcl} +1 & \mbox{when} & \mbox{vertex } i \in S \\ -1 & \mbox{when} & \mbox{vertex } i \in \bar{S} \end{array}\right.
Proof (cont'd). Let X=\mathbf{s}\mathbf{s}^{\top} and notice that x_{ij}=-1 when vertices i and j are on opposite sides of the cut and x_{ij}=1 otherwise, and also X satisfies the constraints of the semidefinite program.
Moreover h_X(G)=b(G), because\frac{1-x_{ij}}{2} = \left\{ \begin{array}{rcl} 1 & \mbox{when} & x_{ij}=-1 \\ 0 & \mbox{when} & x_{ij}=1 \end{array}\right. and recall that by definition h(G) \leq h_X(G).
Proof (cont'd). "bounded monotonicity": use the fact that 0 \leq (1-x_{ij})/2 \leq 1. And for any X and any G, G^+: h_X(G)\leq h_X(G^+)\leq h_X(G)+1.
Observe that feasibility of a matrix X depends only on the number of vertices in a graph but not on the actual graph itself. Taking X to be an optimal solution for h(G^+) we obtain:h(G) \leq h_X(G) \leq h_X(G^+)=h(G^+), and taking X to be the optimal solution for h(G) we obtain:h(G^+) \leq h_X(G^+) \leq h_X(G)+1=h(G)+1.
QED
Just like linear programs, semidefinite programs have the notion of duality.
Moreover the optimal value of a dual maximization problem is never larger than the optimal value of the primal minimization problem.
Under the standard theory, the dual for the primal semidefinite program is:
\max \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\: \frac{m}{2} + \frac{\sum_{i\in V} y_i}{4}\mbox{subject to } \:\:\:\:\: M = -A -y_0 J -Y \ \succeq \ 0
Where m is the number of edges of the graph, A is its adjacency matrix, J is the all 1 matrix of order n, y_0 is an auxiliary variable which affects feasibility, and Y is a diagonal matrix with y_1, \ldots, y_n along its diagonal.
Using this dual formulation Feige and Killian proved the following main lemma:
Lemma. ([FK01]) For a sufficiently large constant c, the function h defined above is "probably good" when p-q \geq c \sqrt{ \ p \frac{\log n}{n}}.That is, for the described parameter range they recover the planted bisection with high probability over the choice of G_{rand}.
Proof. We want show that if p − qis sufficiently large, then w.h.p. the dual has a feasible solution with value b(G_{rand}).We now "guess" a feasible solution for the dual; i.e., the y_i's and the y_0:For every 1\leq i \leq n, let y_i be the contribution of vertex i to the bisection (S, \bar{S}), namely, y_i is the difference between the number of neighbors that vertex i has on the other side of the bisection and the number of neighbors that vertex i has on its own side of the bisection.
The value of the objective function of the dual program is then: \frac{m}{2} + \frac{2 \left|E[(S, \bar{S})]\right| -2\left(m- \left|E[(S, \bar{S})]\right|\right)}{4} = \left|E[(S, \bar{S})]\right|.Which is b(G_{rand})as desired.It remains to show that w.h.p. (over the choice of G_{rand}), it is possible to choose y_0 such that the matrix M=−A−y_0 J−Y is positive semidefinite.
We shall choose y_0=-1 and thus M=−A+J−Y and show that the matrix M has no negative eigenvalues, implying that is is positive semidefinite.
This part of the proof boils down to proving tight asymptotic bounds on the spectrum of the graph family \mathcal{G}_{n,p,q}, based on the following result.
Theorem. ([FK01]) Let c be a sufficiently large and, also B be the adj. matrix of a random graph with n vertices in which each edge has probability p. Then with high probability: \max\left[ |\lambda_2(B)|, |\lambda_n(B)| \right] \leq \max\left[ c \sqrt{pn\log n},\: c\sqrt{(1-p)n\log n} \right] Let C be the adj. matrix of a random balanced bipartite graph in which each edge has probability p. Then with high probability: \max\left[ |\lambda_2(C)|, |\lambda_{n-1}(C)| \right] \leq \max\left[ c \sqrt{pn\log n},\: c\sqrt{(1-p)n\log n} \right] Because the rest of the proof is too long and technical, it is omitted.
QED
So, in trying to answer questions on the behavior of bisection in the average case, we presented a planted partition semirandom model.
We gave a spectral method to retrieve the planted bisection exact, with high probability over the choice of the random graph, under a p-q parameter range constraint.
Finally, this method is constructive and we can actually retrieve the planted partition without too much effort.
Naturally the work of [FK01] sets the basis for a study of multisection under a more general planted partition semirandom model.
Currently A. Kolla, N. Agarwal along with myself are working on extending the result to the more general problem of multisection.
In what follows we will try to present the main idea, the first results and the difficulties that we have encountered up to now.
The natural generalization of the planted partition semirandom model for multisection is:
Again, we are interested in retrieving the initial planted partition.
Let m(G) be the value of the minimum multisection of G: m(G)=\min_{\mathcal{P}}\left[ \left| E(P_1, P_2, \ldots, P_k) \right| \right]
As before, our goal is to find a function \eta(G) such that \eta provides a heuristic in the sense that for many graphs:\eta(G)=m(G).
We want to use the same method, so to compute the multisection in our model we would like to find a polytime computable function \eta that is robust, and in addition is probably good:
\eta(G_{rand})=m(G_{rand})with high probabilityover the choice of G_{rand}.We propose the following semidefinite relaxation of multisection for \eta.
For every x_j vector of the bisection, define k different new ones: x_1^j, x_2^j, \ldots, x_k^j for the multisection.
\min_{\mathbf{X}} \eta_{\mathbf{X}}(G):= \:\: \frac{1}{2k} \sum_{(j_1,j_2) \in E} \sum_{i_1, i_2 \in [k]}\left\|\left(x_{i_1}^{j_1}-x_{i_2}^{j_1}\right) - \left(x_{i_1}^{j_2}-x_{i_2}^{j_2}\right)\right\|^2\mbox{subject to } \:\:\:\:\: (\forall j \in V) \ \sum_{i_1, i_2 \in [k]} \left\|x_{i_1}^{j} - x_{i_2}^{j} \right\| = k-1 \:\: :\ y_{j} \:\:\:\:\:\:\:\:\:\:\:\: (\forall i_1, i_2 \in [k]) \ \sum_{j_1, j_2 \in V} \left(x_{i_1}^{j_1}-x_{i_2}^{j_1}\right) \left(x_{i_1}^{j_2}-x_{i_2}^{j_2}\right)=0 \:\: :\ y_0 \:\:\:\:\:\:\:\:\:\:
Let \eta(G) be equal to the solution of the above semidefinite relaxation of multisection: \eta(G)=\min_{\mathbf{X}} \eta_{\mathbf{X}}(G).
\mbox{subject to } \:\:\:\:\: (\forall i \in V)\ x_{ii} = 1 \:\:\:\:\:\:\:\:\:\:\:\:\:\:\: \sum_{i, j \,\in\, V} x_{ij} = 0 \:\:\:\:\:\:\:\:\:\: X \ \succeq \ 0
We rewrite the semidefinite program in matrix form by defining an (nk \times nk)-matrix as follows.
Let \mathbf{X} be an (n \times n) matrix of (k \times k) matrices. So, instead of having an (n \times n) matrix, we have a blown up matrix by adding in each cell of it cells, a (k \times k) matrix.
Where \mathbf{X}[j_1, i_1, j_2, i_2] = \left\{ x_{i_1}^{j_1} x_{i_2}^{j_2} \right\}. We are using this notation, to preserve separate the vertex and partition numbers.
Which yields:
\min \:\:\:\:\:\:\:\:\:\:\:\:\: \frac{L\otimes \mathbb{A}}{k}\bullet \mathbf{X} \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\mbox{subject to } \:\:\: (\forall j \in V) \ \ \mathbb{1}_j\otimes \mathbb{A}\bullet \mathbf{X} = k-1 \:\:\:\:\ :\ y_j \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\: J\otimes \mathbb{A}\bullet\mathbf{X}=0 \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\: :\ y_0
Where the \otimes is the standard Kronecker product defined as:\mbox{If } A= \left[ \begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array} \right] \mbox{ and } B= \left[ \begin{array}{cc} b_{11} & b_{12} \\ b_{21} & b_{22} \end{array} \right] \mbox{ then } A \otimes B = \left[ \begin{array}{cc} a_{11}\cdot B & a_{12}\cdot B \\ a_{21}\cdot B & a_{22}\cdot B \end{array} \right] = = \left[ \begin{array}{ c c : c c } a_{11}\cdot b_{11} & a_{11}\cdot b_{12} & a_{12}\cdot b_{11} & a_{12}\cdot b_{12} \\ a_{11}\cdot b_{21} & a_{11}\cdot b_{22} & a_{12}\cdot b_{21} & a_{12}\cdot b_{22} \\ \hdashline a_{21}\cdot b_{11} & a_{21}\cdot b_{12} & a_{22}\cdot b_{11} & a_{22}\cdot b_{12} \\ a_{21}\cdot b_{21} & a_{21}\cdot b_{22} & a_{22}\cdot b_{21} & a_{22}\cdot b_{22} \end{array} \right]
\mathbb{A} is the Laplacian (k \times k) matrix of a clique of size k: \mathbb{A}= \left[ \begin{array}{cccc} k-1 & -1 & \ldots & -1 \\ -1 & k-1 & & \vdots \\ \vdots & & \ddots & -1 \\ -1 & \ldots & -1 & k-1 \end{array} \right] \, , and \mathbb{1}_j is the (n \times n) matrix that has 1 at the (j,j) position and 0 everywhere else.
Using the same notion of robustness introduced for bisection, we show that \eta is robust as well and so we can simplify our analysis by ignoring the adversary.
By the notion of semidefinite duality explained before, it suffices to compute the following dual semidefinite program:
\max \:\:\:\:\:\:\:\:\:\:\:\:\: (k-1) \sum_{j \in V} y_j \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\mbox{subject to } \:\:\: \frac{L\otimes \mathbb{A}}{k} - Y\otimes \mathbb{A} -y_0 J \otimes \mathbb{A} \ \succeq \ 0 \:\:\:\:\:\:\:\:
Notice that: \frac{L\otimes \mathbb{A}}{k} - Y\otimes \mathbb{A} -y_0 J \otimes \mathbb{A} \ \succeq \ 0 \iff \iff \left( \frac{L}{k} -Y -y_0 J \right) \otimes \mathbb{A} \ \succeq \ 0
But \mathbb{A} is a Laplacian of a graph and as such a positive semidefinite matrix, hence the above reduces to: M = \frac{L}{k} -Y -y_0 J \ \succeq \ 0
This step is important not only because it simplifies the calculations, but also because M is now again an (n \times n) matrix.
Moreover, we can see that for k=2, M is exactly what [FK01] proved to be positive semidefinite: M = \frac{L}{k} -Y -y_0 J,
Currently, using the dual formulation presented, we have shown that \eta is probably good for a reasonable p-q parameter range, under the strong assumption that:
Assumption. All the inter-part (q) edges form a random k-partite d-regular graph.That is, under the above assumption, we recover the planted multisection with high probability over the choice of G_{rand}.
Similarly to [FK01] the hardness of the proof lies in proving tight asymptotic bounds on the spectrum of the graph family, based on the following result.
Theorem. ([DKK13]) Let c be a sufficiently large constant, and also, C be the adj. matrix of a balanced k-partite random graph where each edge connecting any two clusters has probability p. Then with high probability: \max\left[ |\lambda_k(C)|, |\lambda_{n-k}(C)| \right] \leq \max\left[ c \sqrt{pn\log n},\: c\sqrt{(1-p)n\log n} \right]. Because the rest of the proof is also long and technical, it is omitted.
QED
At the moment, we are working on lifting this admittedly strong requirement, or at least relaxing it.
A very recent work by [ABH14] indicates specific p-q ranges for which the problem is solvable but hard and some negative results based on information theoretic lower bounds.
Towards this we are currently working on a different approach with a slightly different SDP that we believe might solve the problem and potentially for an even better p-q range, in particular the range described as "hard" in the above work.