- Some Easy Optimization Problems Have the Overlap-Gap Property
Joint work with Tselil Schramm.
arXiv:2411.01836. We show that the shortest s-t path problem has the overlap-gap property in (i) sparse G(n,p) graphs and (ii) complete graphs with i.i.d. Exponential edge weights. Furthermore, we demonstrate that in sparse G(n,p) graphs, shortest path is solved by O(log n)-degree polynomial estimators, and a uniform approximate shortest path can be sampled in polynomial time. This constitutes the first example in which the overlap-gap property is not predictive of algorithmic intractability for a (non-algebraic) average-case optimization problem.
abstract arXiv
- Synthetic Continued Pretraining
Joint work with Zitong Yang, Neil Band, Emmanuel Candès, and Tatsunori Hashimoto.
arXiv:2409.07431. Pretraining on large-scale, unstructured internet text has enabled language models to acquire a significant amount of world knowledge. However, this knowledge acquisition is data-inefficient -- to learn a given fact, models must be trained on hundreds to thousands of diverse representations of it. This poses a challenge when adapting a pretrained model to a small corpus of domain-specific documents, where each fact may appear rarely or only once. We propose to bridge this gap with synthetic continued pretraining: using the small domain-specific corpus to synthesize a large corpus more amenable to learning, and then performing continued pretraining on the synthesized corpus. We instantiate this proposal with EntiGraph, a synthetic data augmentation algorithm that extracts salient entities from the source documents and then generates diverse text by drawing connections between the sampled entities. Synthetic continued pretraining using EntiGraph enables a language model to answer questions and follow generic instructions related to the source documents without access to them. If instead, the source documents are available at inference time, we show that the knowledge acquired through our approach compounds with retrieval-augmented generation. To better understand these results, we build a simple mathematical model of EntiGraph, and show how synthetic data augmentation can 'rearrange' knowledge to enable more data-efficient learning.
abstract arXiv
- Discrepancy Algorithms for the Binary Perceptron
Joint work with Tselil Schramm and Kangjie Zhou.
arXiv: 2408.00796. The binary perceptron problem asks us to find a sign vector in the intersection of independently chosen random halfspaces with intercept −κ. We analyze the performance of the canonical discrepancy minimization algorithms of Lovett-Meka and Rothvoss/Eldan-Singh for the asymmetric binary perceptron problem. We obtain new algorithmic results in the κ=0 case and in the large-|κ| case. In the κ→−∞ case, we additionally characterize the storage capacity and complement our algorithmic results with an almost-matching overlap-gap lower bound.
abstract arXiv video
- Spectral Clustering in the Gaussian Mixture Block Model
Joint work with Tselil Schramm.
arXiv: 2305.00979. Gaussian mixture block models are distributions over graphs that strive to model modern networks: to generate a graph from such a model, we associate each vertex with a latent feature vector sampled from a mixture of Gaussians, and we add edge if and only if the feature vectors are sufficiently similar. The different components of the Gaussian mixture represent the fact that there may be different types of nodes with different distributions over features---for example, in a social network each component represents the different attributes of a distinct community. Natural algorithmic tasks associated with these networks are embedding (recovering the latent feature vectors) and clustering (grouping nodes by their mixture component). In this paper we initiate the study of clustering and embedding graphs sampled from high-dimensional Gaussian mixture block models, where the dimension of the latent feature vectors goes to infinity as the size of the network goes to infinity. This high-dimensional setting is most appropriate in the context of modern networks, in which we think of the latent feature space as being high-dimensional. We analyze the performance of canonical spectral clustering and embedding algorithms for such graphs in the case of 2-component spherical Gaussian mixtures, and begin to sketch out the information-computation landscape for clustering and embedding in these models.
abstract arXiv
- Binary Perceptron: Efficient Algorithms Can Find Solutions in a Rare Well-Connected Cluster
Joint work with Emmanuel Abbe and Allan Sly.
STOC 2022. It was recently shown that almost all solutions in the symmetric binary perceptron are isolated, even at low constraint densities, suggesting that finding typical solutions is hard. In contrast, some algorithms have been shown empirically to succeed in finding solutions at low density. This phenomenon has been justified numerically by the existence of subdominant and dense connected regions of solutions, which are accessible by simple learning algorithms. In this paper, we establish formally such a phenomenon for both the symmetric and asymmetric binary perceptrons. We show that at low constraint density (equivalently for overparametrized perceptrons), there exists indeed a subdominant connected cluster of solutions with almost maximal diameter, and that an efficient multiscale majority algorithm can find solutions in such a cluster with high probability, settling in particular an open problem posed by Perkins-Xu '21. In addition, even close to the critical threshold, we show that there exist clusters of linear diameter for the symmetric perceptron, as well as for the asymmetric perceptron under additional assumptions.
abstract arXiv paper
- Proof of the Contiguity Conjecture and Lognormal Limit for the Symmetric Perceptron
Joint work with Emmanuel Abbe and Allan Sly.
FOCS 2021. We consider the symmetric binary perceptron model, a simple model of neural networks that has gathered significant attention in the statistical physics, information theory and probability theory communities, with recent connections made to the performance of learning algorithms in Baldassi et al. '15. We establish that the partition function of this model, normalized by its expected value, converges to a lognormal distribution. As a consequence, this allows us to establish several conjectures for this model: (i) it proves the contiguity conjecture of Aubin et al. '19 between the planted and unplanted models in the satisfiable regime; (ii) it establishes the sharp threshold conjecture; (iii) it proves the frozen 1-RSB conjecture in the symmetric case, conjectured first by Krauth-Mézard '89 in the asymmetric case. In a recent concurrent work of Perkins-Xu [PX21], the last two conjectures were also established by proving that the partition function concentrates on an exponential scale. This left open the contiguity conjecture and the lognormal limit characterization, which are established here. In particular, our proof technique relies on a dense counter-part of the small graph conditioning method, which was developed for sparse models in the celebrated work of Robinson and Wormald.
abstract arXiv paper video
- Mean Field Behavior during the Big Bang for Coalescing Random Walk
Joint work with Jonathan Hermon, Dong Yao, and Lingfu Zhang.
Annals of Probability (2022). In this paper we consider the coalescing random walk model on general graphs G=(V,E). We set up a unified framework to study the leading order of decay rate of P_t, the fraction of occupied sites at time t, and we are particularly interested in the `Big Bang' regime, where t << t_coal:=E[inf{s: there is only one particle at time s}]. Our results show that P_t satisfies certain `mean field behavior', if the graphs satisfy a certain `transience-like' condition. We apply this framework two families of graphs: (1) graphs given by configuration model with minimal degree at least 3, and (2) finite and infinite vertex-transitive graphs. In the first case, we show that for 1 << t << |V|, P_t decays in the order of t^{-1}, and tP_t is close to the probability that two particles starting from the root of the corresponding unimodular Galton-Watson tree never collide after one of them leaves the root. By taking the local weak limit, the convergence of tP_t is also proved for the corresponding unimodular Galton-Watson tree. For the second family of graphs, if we take a growing sequence of finite vertex-transitive graphs G_n=(V_n, E_n), such that the mean meeting time of two independent walkers t_meet is O(|V_n|) and the inverse of the spectral gap t_rel is o(|V_n|), we show for t_rel << t << t_coal, tP_t = (1 + o(1))/P[two random walks never meet before time t]= 2t_meet/|V_n|. In addition, we define a certain natural `uniform transience' condition, and show that in the transitive setup it implies the above for all 1 << t << t_coal. The equality tP_t = (1 + o(1))/P[two random walks never meet before time t] is also proved for all infinite transient transitive unimodular graphs, in particular, all transient transitive amenable graphs.
abstract arXiv paper
- Learning Sparse Graphons and the Generalized Kesten-Stigum Threshold
Joint work with Emmanuel Abbe and Allan Sly.
Annals of Statistics (2023). The problem of learning graphons has attracted considerable attention across several scientific communities, with significant progress over the recent years in sparser regimes. Yet, the current techniques still require diverging degrees in order to succeed with efficient algorithms in the challenging cases where the local structure of the graph is homogeneous. This paper provides an efficient algorithm to learn graphons in the constant expected degree regime. The algorithm is shown to succeed in estimating the rank-k projection of a graphon in the L2 metric if the top k eigenvalues of the graphon satisfy a generalized Kesten-Stigum condition.
abstract arXiv paper video
- A Trace Theorem for Sobolev Spaces on the Sierpinski Gasket
Joint work with Shiping Cao, Robert S. Strichartz, and Prem Talwai.
Communications on Pure and Applied Analysis (2020). We give a discrete characterization of the trace of a class of Sobolev spaces on the Sierpinski gasket to the bottom line. This includes the L2 domain of the Laplacian as a special case. In addition, for Sobolev spaces of low orders, including the domain of the Dirichlet form, the trace spaces are Besov spaces on the line.
abstract arXiv paper
- Computing Eigenvalues and Eigenfunctions of Schrödinger Equations Using a Model Reduction Approach
Joint with Zhiwen Zhang.
Communications in Computational Physics (2018). Let F be an NWUE distribution with mean 1 and G be the stationary renewal distribution of F. We would expect G to converge in distribution to the unit exponential distribution as its mean goes to 1. In this paper, we derive sharp bounds for the Kolmogorov distance between G and the unit exponential distribution, as well as between G and an exponential distribution with the same mean as G. We apply the bounds to geometric convolutions and to first passage times.
abstract paper
Last update: 2024/11. Template adapted from