Multisection in the Planted Partition Model – Konstantinos Koiliaris - UIUC – Map



Multisection in the Planted Partition Model – Konstantinos Koiliaris - UIUC – Map

0 0


multisection-presentation

presentations folder

On Github kxk / multisection-presentation

Multisection in the Planted Partition Model

Konstantinos Koiliaris - UIUC

Map

  • Introduction
    • Motivation
    • Bisection & Multisection
  • Bisection
    • Model
    • Approach
    • Main Results
  • Multisection
    • Model
    • Approach
    • First Results
    • Current Status & Future Work

Motivation

Clustering, in general, is a basic primitive of statistics and machine learning.

It is a main task of a number of fields and a lot of work has been put into it.

Graph partitioning problems and especially bisection and multisection with their numerous variations are central to the study of data clustering.

Map

  • Introduction
    • Motivation
    • Bisection & Multisection
  • Bisection
    • Model
    • Approach
    • Main Results
  • Multisection
    • Model
    • Approach
    • First Results
    • Current Status & Future Work

Bisection   |   Multisection

Given a graph we are looking for a balanced bi-partition that minimizes the number of edges across the two partitions.

Given a graph we are looking for a balanced k-partition that minimizes the total number of edges across the k partitions.

Bisection   |   Multisection

Both of these problems are NP-Hard.

Even worse: no polynomial time approximation algorithm can guarantee a finite approximation ratio for either unless P=NP.

Bisection   |   Multisection

But do these bad instances really appear in practice?

In other words, what about the average case analysis for these problems?

The purpose of this talk will be to answer the above questions for these two problems and to demonstrate the techniques used.

Map

  • Introduction
    • Motivation
    • Bisection & Multisection
  • Bisection
    • Model
    • Approach
    • Main Results
  • Multisection
    • Model
    • Approach
    • First Results
    • Current Status & Future Work

Planted Partition Model

Trying to answer similar questions following works of [BO87], Fiege and Kilian in [FK01] studied bisection under the following planted partition semirandom model:

  • Fix a bisection \mathcal{S}=\left\{S, \bar{S}\right\},
  • Include each intra-part edge with probability p,
  • Include each inter-part edge with probability q < p,
  • (Monotone) Adversary: Arbitrarily removes inter-part and add intra-part edges.

We will refer to G_{rand} as a graph drawn randomly from the distribution \mathcal{G}_{n,p,q} defined by the first three bullets.

In this setting, the question is to return the initial planted bisection.

The Model

Why this model?

Average case analysis is equivalent to considering the probability of success of an algorithm on a random graph.

However choosing graphs too uniformly from the set of all possible graphs, is not very helpful. Only a small percentage of all n vertex graphs have meaningful bisections.

By restricting our attention to this model we are guaranteed graphs that are likely to have non-trivial solutions, are robust and present a realistic model for data sets studied in practice.

Map

  • Introduction
    • Motivation
    • Bisection & Multisection
  • Bisection
    • Model
    • Approach
    • Main Results
  • Multisection
    • Model
    • Approach
    • First Results
    • Current Status & Future Work

The Goal

Let b(G) be the value of the minimum bisection of G: b(G)=\min_{\mathcal{S}}\left[ \left| E(S, \bar{S}) \right| \right]

The goal is to find a function h(G) such that h provides a heuristic in the sense that for many graphs:h(G)=b(G).

Robustness

Recall that after we randomly pick a graph G_{rand}, the adversary can act on it.

Definition. ([FK01]) A function h is robust with respect to monotone adversaries if for every graph G_1 and every graph G_2 obtainable by a single monotone transformation of G_1, h(G_1)=b(G_1) implies h(G_2)=b(G_2).

A monotone adversary is powerless against a robust function h; whenever G_{rand} is such that h(G_{rand})=b(G_{rand}) the monotone adversary cannot prevent h(G)=b(G).

Robustness

Proposition. ([FK01]) If a function h has the two following properties:
  • Lower Bound: h(G) \leq b(G),
  • Bounded Monotonicity: h(G) \leq h(G^+) \leq h(G)+1, where h(G^+)=h(G \cup \{e\}) for any new edge e,

then h is robust.

Proof. Let G_1 be an arbitrary graph for which h(G_1)=b(G_1). Let G_2 be a graph obtained by a single monotone transformation, using a (minimal) bisection (S, \bar{S}) of G_1 with b(G_1) edges. Need to show that h(G_2)=b(G_2).

Robustness

Proof (cont'd). The inequality h(G_2)\leq b(G_2) follows directly from the lower bound property for h. It remains to show that b(G_2)\leq h(G_2). Suppose:

  • G_2 is obtained from G_1 by adding edge (u, v), where u, v \in S or u, v \in \bar{S}. Then b(G_2)=b(G_1). By bounded monotonicity we get: b(G_2) = b(G_1) = h(G_1) \leq h(G_2).
  • G_2 is obtained from G_1 by subtracting an edge (u, v), where u \in S and v \in \bar{S}. Then b(G_2)=b(G_1)−1=h(G_1)−1. By bounded monotonicity we get: h(G_2) \geq h(G_1) − 1 \geq b(G_2).

    QED

The Approach

To compute the bisection in this model, we would like to find a polytime computable function h that is robust, and in addition is "probably good":

h(G_{rand})=b(G_{rand})with high probabilityover the choice of G_{rand}.

[FK01] proposed the following semidefinite relaxation of bisection.

Bisection SDP

Find an order n matrix X=\{x_{ij}\} satisfying:

\min_X h_X(G) := \sum_{(i,j) \, \in\, E} \frac{1 - x_{ij}}{2}

\mbox{subject to } \:\:\:\:\: (\forall i \in V)\ x_{ii} = 1 \:\:\:\:\:\:\:\:\:\:\:\:\:\:\: \sum_{i, j \,\in\, V} x_{ij} = 0 \:\:\:\:\:\:\:\:\:\: X \ \succeq \ 0

Let h(G) be equal to the solution of the above semidefinite relaxation of bisection: h(G)=\min_X h_X(G).

Proving Robustness

By using semidefinite programming h(G) can be computed in polynomial time.

Proposition. ([FK01]) The function h is robust.

Proof. Suffices to show that h satisfies the two properties: "lower bound" and "bounded monotonicity".

"lower bound": To see that h(G)\leq b(G), consider the indicator vector \mathbf{s}=(s_1, \ldots, s_n) \in \{-1,+1\}^n where: s_i = \left\{ \begin{array}{rcl} +1 & \mbox{when} & \mbox{vertex } i \in S \\ -1 & \mbox{when} & \mbox{vertex } i \in \bar{S} \end{array}\right.

Proving Robustness

Proof (cont'd). Let X=\mathbf{s}\mathbf{s}^{\top} and notice that x_{ij}=-1 when vertices i and j are on opposite sides of the cut and x_{ij}=1 otherwise, and also X satisfies the constraints of the semidefinite program.

Moreover h_X(G)=b(G), because\frac{1-x_{ij}}{2} = \left\{ \begin{array}{rcl} 1 & \mbox{when} & x_{ij}=-1 \\ 0 & \mbox{when} & x_{ij}=1 \end{array}\right. and recall that by definition h(G) \leq h_X(G).

Proving Robustness

Proof (cont'd). "bounded monotonicity": use the fact that 0 \leq (1-x_{ij})/2 \leq 1. And for any X and any G, G^+: h_X(G)\leq h_X(G^+)\leq h_X(G)+1.

Observe that feasibility of a matrix X depends only on the number of vertices in a graph but not on the actual graph itself. Taking X to be an optimal solution for h(G^+) we obtain:h(G) \leq h_X(G) \leq h_X(G^+)=h(G^+), and taking X to be the optimal solution for h(G) we obtain:h(G^+) \leq h_X(G^+) \leq h_X(G)+1=h(G)+1.

QED

Duality

Just like linear programs, semidefinite programs have the notion of duality.

Moreover the optimal value of a dual maximization problem is never larger than the optimal value of the primal minimization problem.

The Dual SDP

Under the standard theory, the dual for the primal semidefinite program is:

\max \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\: \frac{m}{2} + \frac{\sum_{i\in V} y_i}{4}

\mbox{subject to } \:\:\:\:\: M = -A -y_0 J -Y \ \succeq \ 0

Where m is the number of edges of the graph, A is its adjacency matrix, J is the all 1 matrix of order n, y_0 is an auxiliary variable which affects feasibility, and Y is a diagonal matrix with y_1, \ldots, y_n along its diagonal.

Map

  • Introduction
    • Motivation
    • Bisection & Multisection
  • Bisection
    • Model
    • Approach
    • Main Results
  • Multisection
    • Model
    • Approach
    • First Results
    • Current Status & Future Work

Main Lemma

Using this dual formulation Feige and Killian proved the following main lemma:

Lemma. ([FK01]) For a sufficiently large constant c, the function h defined above is "probably good" when p-q \geq c \sqrt{ \ p \frac{\log n}{n}}.

That is, for the described parameter range they recover the planted bisection with high probability over the choice of G_{rand}.

Proof

Proof. We want show that if p − qis sufficiently large, then w.h.p. the dual has a feasible solution with value b(G_{rand}).We now "guess" a feasible solution for the dual; i.e., the y_i's and the y_0:For every 1\leq i \leq n, let y_i be the contribution of vertex i to the bisection (S, \bar{S}), namely, y_i is the difference between the number of neighbors that vertex i has on the other side of the bisection and the number of neighbors that vertex i has on its own side of the bisection.

Proof (cont'd)

The value of the objective function of the dual program is then: \frac{m}{2} + \frac{2 \left|E[(S, \bar{S})]\right| -2\left(m- \left|E[(S, \bar{S})]\right|\right)}{4} = \left|E[(S, \bar{S})]\right|.Which is b(G_{rand})as desired.It remains to show that w.h.p. (over the choice of G_{rand}), it is possible to choose y_0 such that the matrix M=−A−y_0 J−Y is positive semidefinite.

Proof (cont'd)

We shall choose y_0=-1 and thus M=−A+J−Y and show that the matrix M has no negative eigenvalues, implying that is is positive semidefinite.

This part of the proof boils down to proving tight asymptotic bounds on the spectrum of the graph family \mathcal{G}_{n,p,q}, based on the following result.

Proof (cont'd)

Theorem. ([FK01]) Let c be a sufficiently large and, also B be the adj. matrix of a random graph with n vertices in which each edge has probability p. Then with high probability: \max\left[ |\lambda_2(B)|, |\lambda_n(B)| \right] \leq \max\left[ c \sqrt{pn\log n},\: c\sqrt{(1-p)n\log n} \right] Let C be the adj. matrix of a random balanced bipartite graph in which each edge has probability p. Then with high probability: \max\left[ |\lambda_2(C)|, |\lambda_{n-1}(C)| \right] \leq \max\left[ c \sqrt{pn\log n},\: c\sqrt{(1-p)n\log n} \right] Because the rest of the proof is too long and technical, it is omitted.

QED

Recap

So, in trying to answer questions on the behavior of bisection in the average case, we presented a planted partition semirandom model.

We gave a spectral method to retrieve the planted bisection exact, with high probability over the choice of the random graph, under a p-q parameter range constraint.

Finally, this method is constructive and we can actually retrieve the planted partition without too much effort.

Map

  • Introduction
    • Motivation
    • Bisection & Multisection
  • Bisection
    • Model
    • Approach
    • Main Results
  • Multisection
    • Model
    • Approach
    • First Results
    • Current Status & Future Work

Multisection

Naturally the work of [FK01] sets the basis for a study of multisection under a more general planted partition semirandom model.

Currently A. Kolla, N. Agarwal along with myself are working on extending the result to the more general problem of multisection.

In what follows we will try to present the main idea, the first results and the difficulties that we have encountered up to now.

Planted Partition Model

The natural generalization of the planted partition semirandom model for multisection is:

  • Fix a k-partition \mathcal{P}=\{P_1, P_2, \ldots, P_k\},
  • Include each intra-part edge with probability p,
  • Include each inter-part edge with probability q < p,
  • (Monotone) Adversary: Arbitrarily removes inter-part and add intra-part edges.

Again, we are interested in retrieving the initial planted partition.

Map

  • Introduction
    • Motivation
    • Bisection & Multisection
  • Bisection
    • Model
    • Approach
    • Main Results
  • Multisection
    • Model
    • Approach
    • First Results
    • Current Status & Future Work

The Goal

Let m(G) be the value of the minimum multisection of G: m(G)=\min_{\mathcal{P}}\left[ \left| E(P_1, P_2, \ldots, P_k) \right| \right]

As before, our goal is to find a function \eta(G) such that \eta provides a heuristic in the sense that for many graphs:\eta(G)=m(G).

The Approach

We want to use the same method, so to compute the multisection in our model we would like to find a polytime computable function \eta that is robust, and in addition is probably good:

\eta(G_{rand})=m(G_{rand})with high probabilityover the choice of G_{rand}.

We propose the following semidefinite relaxation of multisection for \eta.

Multisection SDP

For every x_j vector of the bisection, define k different new ones: x_1^j, x_2^j, \ldots, x_k^j for the multisection.

\min_{\mathbf{X}} \eta_{\mathbf{X}}(G):= \:\: \frac{1}{2k} \sum_{(j_1,j_2) \in E} \sum_{i_1, i_2 \in [k]}\left\|\left(x_{i_1}^{j_1}-x_{i_2}^{j_1}\right) - \left(x_{i_1}^{j_2}-x_{i_2}^{j_2}\right)\right\|^2

\mbox{subject to } \:\:\:\:\: (\forall j \in V) \ \sum_{i_1, i_2 \in [k]} \left\|x_{i_1}^{j} - x_{i_2}^{j} \right\| = k-1 \:\: :\ y_{j} \:\:\:\:\:\:\:\:\:\:\:\: (\forall i_1, i_2 \in [k]) \ \sum_{j_1, j_2 \in V} \left(x_{i_1}^{j_1}-x_{i_2}^{j_1}\right) \left(x_{i_1}^{j_2}-x_{i_2}^{j_2}\right)=0 \:\: :\ y_0 \:\:\:\:\:\:\:\:\:\:

Let \eta(G) be equal to the solution of the above semidefinite relaxation of multisection: \eta(G)=\min_{\mathbf{X}} \eta_{\mathbf{X}}(G).

Recall: Bisection SDP

\min_X h_X(G) := \sum_{(i,j) \, \in\, E} \frac{1 - x_{ij}}{2}

\mbox{subject to } \:\:\:\:\: (\forall i \in V)\ x_{ii} = 1 \:\:\:\:\:\:\:\:\:\:\:\:\:\:\: \sum_{i, j \,\in\, V} x_{ij} = 0 \:\:\:\:\:\:\:\:\:\: X \ \succeq \ 0

Multisection SDP

We rewrite the semidefinite program in matrix form by defining an (nk \times nk)-matrix as follows.

Let \mathbf{X} be an (n \times n) matrix of (k \times k) matrices. So, instead of having an (n \times n) matrix, we have a blown up matrix by adding in each cell of it cells, a (k \times k) matrix.

Where \mathbf{X}[j_1, i_1, j_2, i_2] = \left\{ x_{i_1}^{j_1} x_{i_2}^{j_2} \right\}. We are using this notation, to preserve separate the vertex and partition numbers.

The Matrix SDP Relaxation

Which yields:

\min \:\:\:\:\:\:\:\:\:\:\:\:\: \frac{L\otimes \mathbb{A}}{k}\bullet \mathbf{X} \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:

\mbox{subject to } \:\:\: (\forall j \in V) \ \ \mathbb{1}_j\otimes \mathbb{A}\bullet \mathbf{X} = k-1 \:\:\:\:\ :\ y_j \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\: J\otimes \mathbb{A}\bullet\mathbf{X}=0 \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\: :\ y_0

The Matrix SDP Relaxation

Where the \otimes is the standard Kronecker product defined as:\mbox{If } A= \left[ \begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array} \right] \mbox{ and } B= \left[ \begin{array}{cc} b_{11} & b_{12} \\ b_{21} & b_{22} \end{array} \right] \mbox{ then } A \otimes B = \left[ \begin{array}{cc} a_{11}\cdot B & a_{12}\cdot B \\ a_{21}\cdot B & a_{22}\cdot B \end{array} \right] = = \left[ \begin{array}{ c c : c c } a_{11}\cdot b_{11} & a_{11}\cdot b_{12} & a_{12}\cdot b_{11} & a_{12}\cdot b_{12} \\ a_{11}\cdot b_{21} & a_{11}\cdot b_{22} & a_{12}\cdot b_{21} & a_{12}\cdot b_{22} \\ \hdashline a_{21}\cdot b_{11} & a_{21}\cdot b_{12} & a_{22}\cdot b_{11} & a_{22}\cdot b_{12} \\ a_{21}\cdot b_{21} & a_{21}\cdot b_{22} & a_{22}\cdot b_{21} & a_{22}\cdot b_{22} \end{array} \right]

The Matrix SDP Relaxation

\mathbb{A} is the Laplacian (k \times k) matrix of a clique of size k: \mathbb{A}= \left[ \begin{array}{cccc} k-1 & -1 & \ldots & -1 \\ -1 & k-1 & & \vdots \\ \vdots & & \ddots & -1 \\ -1 & \ldots & -1 & k-1 \end{array} \right] \, , and \mathbb{1}_j is the (n \times n) matrix that has 1 at the (j,j) position and 0 everywhere else.

Robustness

Using the same notion of robustness introduced for bisection, we show that \eta is robust as well and so we can simplify our analysis by ignoring the adversary.

The Dual

By the notion of semidefinite duality explained before, it suffices to compute the following dual semidefinite program:

\max \:\:\:\:\:\:\:\:\:\:\:\:\: (k-1) \sum_{j \in V} y_j \:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:

\mbox{subject to } \:\:\: \frac{L\otimes \mathbb{A}}{k} - Y\otimes \mathbb{A} -y_0 J \otimes \mathbb{A} \ \succeq \ 0 \:\:\:\:\:\:\:\:

The Dual

Notice that: \frac{L\otimes \mathbb{A}}{k} - Y\otimes \mathbb{A} -y_0 J \otimes \mathbb{A} \ \succeq \ 0 \iff \iff \left( \frac{L}{k} -Y -y_0 J \right) \otimes \mathbb{A} \ \succeq \ 0

But \mathbb{A} is a Laplacian of a graph and as such a positive semidefinite matrix, hence the above reduces to: M = \frac{L}{k} -Y -y_0 J \ \succeq \ 0

The Dual

This step is important not only because it simplifies the calculations, but also because M is now again an (n \times n) matrix.

Moreover, we can see that for k=2, M is exactly what [FK01] proved to be positive semidefinite: M = \frac{L}{k} -Y -y_0 J,

Map

  • Introduction
    • Motivation
    • Bisection & Multisection
  • Bisection
    • Model
    • Approach
    • Main Results
  • Multisection
    • Model
    • Approach
    • First Results
    • Current Status & Future Work

First Results

Currently, using the dual formulation presented, we have shown that \eta is probably good for a reasonable p-q parameter range, under the strong assumption that:

Assumption. All the inter-part (q) edges form a random k-partite d-regular graph.

That is, under the above assumption, we recover the planted multisection with high probability over the choice of G_{rand}.

Similarly to [FK01] the hardness of the proof lies in proving tight asymptotic bounds on the spectrum of the graph family, based on the following result.

First Results

Theorem. ([DKK13]) Let c be a sufficiently large constant, and also, C be the adj. matrix of a balanced k-partite random graph where each edge connecting any two clusters has probability p. Then with high probability: \max\left[ |\lambda_k(C)|, |\lambda_{n-k}(C)| \right] \leq \max\left[ c \sqrt{pn\log n},\: c\sqrt{(1-p)n\log n} \right]. Because the rest of the proof is also long and technical, it is omitted.

QED

Map

  • Introduction
    • Motivation
    • Bisection & Multisection
  • Bisection
    • Model
    • Approach
    • Main Results
  • Multisection
    • Model
    • Approach
    • First Results
    • Current Status & Future Work

Where Do We Stand?

At the moment, we are working on lifting this admittedly strong requirement, or at least relaxing it.

A very recent work by [ABH14] indicates specific p-q ranges for which the problem is solvable but hard and some negative results based on information theoretic lower bounds.

Towards this we are currently working on a different approach with a slightly different SDP that we believe might solve the problem and potentially for an even better p-q range, in particular the range described as "hard" in the above work.

Thank you

References

  • [BO87] Boppana R.: Eigenvalues and graph bisection. 28th Annual Symposium on Foundations of Computer Science. (1987) 280--285
  • [FK01] Feige U., Kilian J.: Heuristics for Semirandom Graph Problems. Journal of Computing and System Sciences. 63 (2001) 639--671
  • [S01] McSherry F.: Spectral partitioning of random graphs. 42nd Annual Symposium on Foundations of Computer Science. (2001) 529--537
  • [DKK13] Dasgupta S., Koiliaris K., Kolla, A.: Spectra of Random Graphs with Planted Partitions. Manuscript. (2013)
  • [ABH14] Abbe E., Bandeira, A., Hall G.: Exact Recovery in the Stochastic Block Model. To appear in 2014.