My Google Scholar Citations Page

- Lift and Project Algorithms for Precedence Constrained Scheduling to Minimize Completion Time

*Shashwat Garg, Janardhan Kulkarni and S. Li*

SODA 2019 (to appear)

- A Polynomial Time Constant Approximation For Minimizing Total Weighted Flow-time (arxiv)[Abstract]

*Uri Feige, Janardhan Kulkarni and S. Li*

SODA 2019 (to appear)

We consider the classic scheduling problem of minimizing the total weighted flow-time on a single machine (min-WPFT), when preemption is allowed. In this problem, we are given a set of

*n*jobs, each job having a release time*r*_{j}, a processing time*p*_{j}, and a weight*w*_{j}. The flow-time of a job is defined as the amount of time the job spends in the system before it completes; that is,*F*_{j}=*C*_{j}-*r*_{j}, where*C*_{j}is the completion time of job. The objective is to minimize the total weighted flow-time of jobs.This NP-hard problem has been studied quite extensively for decades. In a recent breakthrough, Batra, Garg, and Kumar presented a

*pseudo-polynomial*time algorithm that has an*O*(1) approximation ratio. The design of a truly polynomial time algorithm, however, remained an open problem. In this paper, we show a transformation from pseudo-polynomial time algorithms to polynomial time algorithms in the context of min-WPFT. Our result combined with the result of Batra, Garg, and Kumar settles the long standing conjecture that there is a polynomial time algorithm with*O*(1)-approximation for min-WPFT. - On Facility Location with General Lower Bounds (arxiv)[Abstract]

*S. Li*

SODA 2019 (to appear)

In this paper, we give the first constant approximation algorithm for the lower bounded facility location (LBFL) problem with general lower bounds. Prior to our work, such algorithms were only known for the special case where all facilities have the same lower bound: Svitkina [Svi10] gave a 448-approximation for the special case, and subsequently Ahmadian and Swamy [AS13] improved the approximation factor to 82.6.

As in [Svi10] and [AS13], our algorithm for LBFL with general lower bounds works by reducing the problem to the capacitated facility location (CFL) problem. To handle the challenges raised by the general lower bounds, it involves more reduction steps. One main complication is that after aggregating the clients and facilities at a few locations, each of these locations may contain many facilities with different opening costs and lower bounds. To address this issue, we introduce and reduce the LBFL problem to two intermediate problems called the LBFL with penalty (LBFL-P) and the transportation with configurable supplies and demands (TCSD) problems, which in turn can be reduced to the CFL problem.

- Distributed k-Clustering for Data with Heavy Noise [Abstract]

*Xiangyu Guo and S. Li*

NIPS 2018 (Spotlight)

In this paper, we consider the

*k*-center/median/means clustering with outliers problems (or the (*k*,*z*)-center/median/means problems) in the distributed setting. Most previous distributed algorithms have their communication costs linearly depending on*z*, the number of outliers. Recently Guha et al. overcame this dependence issue by considering bi-criteria approximation algorithms that output solutions with 2*z*outliers. For the case where*z*is large, the extra*z*outliers discarded by the algorithms might be too large, considering that the data gathering process might be costly. In this paper, we improve the number of outliers to the best possible (1+*epsilon*)*z*, while maintaining the*O*(1)-approximation ratio and independence of communication cost on*z*. The problems we consider include the (*k*,*z*)-center problem, and (*k*,*z*)-median/means problems in Euclidean metrics. Implementation of the our algorithm for (*k*,*z*)-center shows that it outperforms many previous algorithms, both in terms of the communication cost and quality of the output solution. - Stochastic fairness in clustering

*David Harris, S. Li, Thomas Pensyl, Aravind Srinivason and Khoa Trinh*

NIPS 2018

- Flow-time Optimization for Concurrent Open-Shop and Precedence Constrained Scheduling Models (arxiv)[Abstract]

*Janardhan Kulkarni and S. Li*

APPROX 2018

Scheduling a set of jobs over a collection of machines is a fundamental problem that needs to be solved millions of times a day in various computing platforms: in operating systems, in large data clusters, and in data centers. Along with makespan,

*flow-time*, which measures the length of time a job spends in a system before it completes, is arguably the most important metric to measure the performance of a scheduling algorithm. In recent years, there has been a remarkable progress in understanding flow-time based objective functions in diverse settings such as unrelated machines scheduling, broadcast scheduling, multi-dimensional scheduling, to name a few.Yet, our understanding of the flow-time objective is limited mostly to the scenarios where jobs have no dependencies. On the other hand, in almost all real world applications, think of MapReduce settings for example, jobs have dependencies that need to be respected while making scheduling decisions. In this paper, we take first steps towards understanding this complex problem. In particular, we consider two classical scheduling problems that capture dependencies across jobs: 1)

*concurrent open-shop scheduling*(\os) and 2)*precedence constrained scheduling*. Our main motivation to study these problems specifically comes from their relevance to two scheduling problems that have gained importance in the context of data centers:*co-flow scheduling*and*DAG scheduling*. We design almost optimal approximation algorithms for \os and \pc, and show hardness results. - Approximating Global Optimum for Probabilistic Truth Discovery [Abstract]

*S. Li, Minwei Ye and Jinhui Xu*

COCOON 2018

**best paper award**

The problem of truth discovery arises in many areas such as database, data mining, data crowdsourcing and machine learning. It seeks trustworthy information from possibly conflicting data provided by multiple sources. Due to its practical importance, the problem has been studied extensively in recent years. Two competing models were proposed for truth discovery, weight-based model and probabilistic model. While (1+

*ε*)-approximations have already been obtained for the weight-based model, no quality guaranteed solution has been discovered yet for the probabilistic model. In this paper, we focus on the probabilistic model and formulate it as a geometric optimization problem. Based on a sampling technique and a few other ideas, we achieve the first (1 +*ε*)-approximation solution. The general technique we developed has the potential to be used to solve other geometric optimization problems. - Constant Approximation for k-Median and k-Means with Outliers via Iterative Rounding (arxiv)[Abstract]

*Ravishankar Krishnaswamy, S. Li and Sai Sandeep*

STOC 2018

In this paper, we present a novel iterative rounding framework for many clustering problems. This leads to an (

*α*_{1}+*ε*≈ 7.08 +*ε*)-approximation algorithm for*k*-median with outliers, greatly improving upon the large implicit constant approximation ratio of Chen [Chen, SODA 08]. For*k*-means with outliers, we give an (*α*_{2}+*ε*≈ 53 +*ε*)-approximation, which is the first*O*(1)-approximation for this problem. The iterative algorithm framework is very versatile; we show how it can be used to give*α*_{1}- and (*α*_{1}+*ε*)-approximation algorithms for matroid median and knapsack median problems respectively, improving upon the previous best approximations ratios of 8 [Swamy, ACM Trans. Algorithms] and 17.46 [Byrka et al, ESA 2015].The natural LP relaxation for the

*k*-median/*k*-means with outliers problem has an unbounded integrality gap. In spite of this negative result, our iterative rounding framework shows that we can round an LP solution to an almost-integral solution of small cost, in which we have at most two fractionally open facilities. Thus, the LP integrality gap arises due to the gap between almost-integral and fully-integral solutions. Then, using a pre-processing procedure, we show how to convert an almost-integral solution to a fully-integral solution losing only a constant-factor in the approximation ratio. By further using a sparsification technique, the additive factor loss incurred by the conversion can be reduced to any*ε*> 0. - Scheduling to Minimize Total Weighted Completion Time via Time-Indexed Linear Programming Relaxations (arxiv)[Abstract]

*S. Li*

FOCS 2017

**invited to a special issue of SICOMP**

We study the approximation of scheduling problems with the objective of minimizing total weighted completion time, under many different machine models: identical machine model with job precedence constraints, with uniform and non-uniform job sizes, and related machine model with job precedence constraints. For these problems, we give approximation algorithms that improve upon the previous 15 to 20-year old state-of- art results. A major theme in these results is the use of time-indexed linear programming relaxations. These are natural LP relaxations for their respective problems, but are not studied explicitly in the literature.

We also consider the scheduling problem of minimizing total weighted completion time on unrelated machines. The recent breakthrough result of [Bansal-Srinivasan-Svensson, STOC 2016] gave a (1.5 -

*c*)-approximation for the problem, based on some lift-and-project SDP relaxation. Our main result is that a (1.5 -*c*)-approximation can also be achieved using a natural and considerably simpler time-indexed LP relaxation for the problem. We hope the use of this time-indexed LP relaxation can provide new insights into this problem. - Breaking 1-1/
*e*Barrier for Non-preemptive Throughput Maximization [Abstract]

*Sungjin Im, S. Li and Benjamin Moseley*

IPCO 2017

In this paper we consider one of the most basic scheduling problems where jobs have their respective arrival times and deadlines. The goal is to schedule as many jobs as possible mph

*non-preemptively*by their respective deadlines on*m*identical parallel machines. For the last decade, the best approximation ratio known for the single machine case (*m*= 1) has been 1-1/*e*-*ε*≈ 0.632 due to [Chuzhoy-Ostrovsky-Rabani, FOCS 2001 and MOR 2006]. We break this barrier and give an improved 0.644-approximation. For the multiple machines case, we give an algorithm whose approximation guarantee becomes arbitrarily close to 1 as the number of machines increases. This improves upon the previous best 1 - 1/ (1 + 1/*m*)^{m}approximation due to [Bar-Noy tal, STOC 1999 and SICOMP 2009], which converges to 1-1/*e*as*m*goes to infinity. Our result for the multiple-machine case extends to the weighted throughput objective where jobs have different weights, and the goal is to schedule jobs with the maximum total weight. Our results show that the 1 - 1/*e*approximation factor widely observed in various coverage problems is not tight for the non-preemptive maximum throughput scheduling problem. - Constant Approximation Algorithm for Non-Uniform Capacitated Multi-Item Lot-Sizing via Strong Covering Inequalities (arxiv)[Abstract]

*S. Li*

SODA 2017

We study the non-uniform capacitated multi-item lot-sizing (CMILS) problem. In this problem, there is a set of demands over a planning horizon of

*T*time periods and all demands must be satisfied on time. We can place an order at the beginning of each period*s*, incurring an ordering cost*K*_{s}. The total quantity of all products ordered at time*s*can not exceed a given capacity*C*_{s}. On the other hand, carrying inventory from time to time incurs inventory holding cost. The goal of the problem is to find a feasible solution that minimizes the sum of ordering and holding costs.Levi et al. (Levi, Lodi and Sviridenko, Mathmatics of Operations Research 33(2), 2008) gave a 2-approximation for the problem when the capacities

*C*_{s}are the same. In this paper, we extend their result to the case of non-uniform capacities. That is, we give a constant approximation algorithm for the capacitated multi-item lot-sizing problem with general capacities.The constant approximation is achieved by adding an exponentially large set of new covering inequalities to the natural facility-location type linear programming relaxation for the problem. Along the way of our algorithm, we reduce the CMILS problem to two generalizations of the classic knapsack covering problem. We give LP-based constant approximation algorithms for both generalizations, via the iterative rounding technique.

- Tight Network Topology Dependent Bounds on Rounds of Communication (arxiv)[Abstract]

*Arkadev Chattopadhyay, Michael Langberg, S. Li and Atri Rudra*

SODA 2017

We prove tight network topology dependent bounds on the round complexity of computing well studied

*k*-party functions such as set disjointness and element distinctness. Unlike the usual case in the CONGEST model in distributed computing, we fix the function and then vary the underlying network topology. This complements the recent such results on total communication that have received some attention. We also present some applications to distributed graph computation problems.Our main contribution is a proof technique that allows us to reduce the problem on a general graph topology to a relevant two-party communication complexity problem. However, unlike many previous works that also used the same high level strategy, we do not reason about a two-party communication problem that is induced by a cut in the graph. To stitch back the various lower bounds from the two party communication problems, we use the notion of timed graph that has seen prior use in network coding. Our reductions use some tools from Steiner tree packing and multi-commodity flow problems that have a delay constraint.

- Better Unrelated Machine Scheduling for Weighted Completion Time via Random Offsets from Non-Uniform Distributions (arxiv)[Abstract]

*Sungjin Im and S. Li*

FOCS 2016

In this paper we consider the classic scheduling problem of minimizing total weighted completion time on unrelated machines when jobs have release times, i.e, R | r

_{ij}| Σ_{j}w_{j}C_{j}using the three-field notation. For this problem, a 2-approximation is known based on a novel convex programming (J. ACM 2001 by Skutella). It has been a long standing open problem if one can improve upon this 2-approximation (Open Problem 8 in J. of Sched. 1999 by Schuurman and Woeginger). We answer this question in the affirmative by giving a 1.8786-approximation. We achieve this via a surprisingly simple linear programming, but a novel rounding algorithm and analysis. A key ingredient of our algorithm is the use of random offsets sampled from non-uniform distributions.We also consider the preemptive version of the problem, i.e, R | r

_{ij},pmtn | Σ_{j}w_{j}C_{j}. We again use the idea of sampling offsets from non-uniform distributions to give the first better than 2-approximation for this problem. This improvement also requires use of a configuration LP with variables for each job's complete schedules along with more careful analysis. For both non-preemptive and preemptive versions, we break the approximation barrier of 2 for the first time. - Constant Approximation for Capacitated
*k*-Median with (1+*ε*)-Capacity Violation (arxiv)[Abstract]

*Gokalp Demirci and S. Li*

ICALP 2016

We study the Capacitated k-Median problem for which existing constant-factor approximation algorithms are all pseudo-approximations that violate either the capacities or the upper bound k on the number of open facilities. Using the natural LP relaxation for the problem, one can only hope to get the violation factor down to 2. Li [SODA'16] introduced a novel LP to go beyond the limit of 2 and gave a constant-factor approximation algorithm that opens (1+

*ε*)k facilities.We use the configuration LP of Li [SODA'16] to give a constant-factor approximation for the Capacitated k-Median problem in a seemingly harder configuration: we violate only the capacities by 1+

*ε*. This result settles the problem as far as pseudo-approximation algorithms are concerned. - Improved Approximation for Node-Disjoint Paths in Planar Graphs (arxiv)[Abstract]

*Julia Chuzhoy, David Kim and S. Li*

STOC 2016

We study the classical Node-Disjoint Paths (NDP) problem: given an n-vertex graph G and a collection M={(s

_{1},t_{1}),…,(s_{k},t_{k})} of pairs of vertices of G called demand pairs, find a maximum-cardinality set of node-disjoint paths connecting the demand pairs. NDP is one of the most basic routing problems, that has been studied extensively. Despite this, there are still wide gaps in our understanding of its approximability: the best currently known upper bound of O(sqrt n) on its approximation ratio is achieved via a simple greedy algorithm, while the best current negative result shows that the problem does not have a better than Ω(log^{1/2−δ}n)-approximation for any constant*δ*, under standard complexity assumptions. Even for planar graphs no better approximation algorithms are known, and to the best of our knowledge, the best negative bound is APX-hardness. Perhaps the biggest obstacle to obtaining better approximation algorithms for NDP is that most currently known approximation algorithms for this type of problems rely on the standard multicommodity flow relaxation, whose integrality gap is Ω(sqrt n) for NDP, even in planar graphs.In this paper, we break the barrier of O(sqrt n) on the approximability of the NDP problem in planar graphs and obtain an tildeO(n

^{9/19})-approximation. We introduce a new linear programming relaxation of the problem, and a number of new techniques, that we hope will be helpful in designing more powerful algorithms for this and related problems. - Approximating Capacitated
*k*-Median with (1+*ε*)*k*Open Facilities (arxiv)[Abstract]

*S. Li*

SODA 2016

In this paper, we give a constant approximation for capacitated k-median (CKM), by violating the cardinality constraint by a factor of 1 +

*ε*. This generalizes the result of [Li15], which only works for the uniform capacitated case. Our algorithm gives the first constant approximation for general CKM that only opens (1 +*ε*)k facilities. Indeed, most previous algorithms for CKM are based on the natural LP relaxation for the problem, which has unbounded integrality gap even if (2 -*ε*)k facilities can be opened.Instead, our algorithm is based on a novel configuration LP for the problem. For each set B of potential facilities, we try to characterize the convex hull of all valid integral solutions restricted to the instance defined by B and all clients. In order to reduce the size of the polytope, we cut the polytope into two parts: conditioned on the event that a few facilities are open in B we have the exact polytope; conditioned on the event that many facilities are open, we only have a relaxed polytope.

This LP can not be solved efficiently as there are exponential number of sets B. Instead, we use the standard trick: given a fractional solution, our rounding algorithm either succeeds, or finds a set B such that the fractional solution is not valid for the set B. This allows us to combine the rounding algorithm with the ellipsoid method.

- On (1,
*ε*)-Restricted Asgsignment Makespan Minimization (arxiv)[Abstract]

*Deeparnab Chakrabarty, Sanjeev Khanna and S. Li*

SODA 2015

Makespan minimization on unrelated machines is a classic problem in approximation algorithms. No polynomial time (2 -

*δ*)-approximation algorithm is known for the problem for constant*δ*> 0. This is true even for certain special cases, most notably the**restricted assignment**problem where each job has the same load on any machine but can be assigned to one from a specified subset. Recently in a breakthrough result, Svensson [Sve11] proved that the integrality gap of a certain configuration LP relaxation is upper bounded by 1.95 for the restricted assignment problem; however, the rounding algorithm is**not known**to run in polynomial time.In this paper we consider the (1,

*ε*)-restricted assignment problem where each job is either heavy (p_{j}= 1) or light (p_{j}=*ε*), for some parameter*ε*> 0. Our main result is a (2-*δ*)-approximate**polynomial time**algorithm for the (1,*ε*)-restricted assignment problem for a fixed constant*δ*> 0. Even for this special case, the best polynomial-time approximation factor known so far is 2. We obtain this result by rounding the configuration LP relaxation for this problem. A simple reduction from vertex cover shows that this special case remains NP-hard to approximate to within a factor better than 7/6. - A Dynamic Programming Framework for Non-Preemptive Scheduling Problems on Multiple Machines [Abstract]

*Sungjin Im, S. Li, Benjamin Moseley and Eric Torng*

SODA 2015

In this paper, we consider a variety of scheduling problems where jobs with release times are to be scheduled emph

*non-preemptively*on a set of identical machines. The problems considered are machine minimization, (weighted) throughput maximization and min-sum objectives such as (weighted) flow time and (weighted) tardiness.We develop a novel quasi-polynomial time dynamic programming framework that gives O(1)-speed O(1)-approximation algorithms for the offline versions of machine minimization and min-sum problems. For the weighted throughput problem, the framework gives a (1+

*ε*)-speed (1-*ε*)-approximation algorithm. The generic DP is based on improving a aive exponential time DP by developing a sketching scheme that compactly and accurately approximates parameters used in the DP states. We show that the loss of information due to the sketching scheme can be offset with limited resource augmentation. This framework is powerful and flexible, allowing us to apply it to this wide range of scheduling objectives and settings. We also provide new insight into the relative power of speed augmentation versus machine augmentation for non-preemptive scheduling problems; specifically, we give new evidence for the power and importance of extra speed for some non-preemptive scheduling problems.This novel DP framework leads to many new algorithms with improved results that solve many open problems, albeit with quasi-polynomial running times. We highlight our results as follows. For the problems with min-sum objectives, we give the first O(1)-speed O(1)-approximation algorithms for the multiple-machine setting. Even for the single machine case, we reduce both the resource augmentation required and the approximation ratios. In particular, our approximation ratios are either 1 or 1+

*ε*. Most of our algorithms use speed 1+*ε*or 2+*ε*. We also resolve an open question (albeit with a quasi-polynomial time algorithm) of whether less than 2-speed could be used to achieve an O(1)-approximation for flow time. New techniques are needed to address this open question since it was proven that previous techniques are insufficient. We answer this open question by giving an algorithm that achieves a (1+*ε*)-speed 1-approximation for flow time and (1+*ε*)-speed (1+*ε*)-approximation for weighted flow time.For the machine minimization problem, we give the first result using constant resource augmentation by showing a (1+

*ε*)-speed 2-approximation, and the first result only using speed augmentation and no additional machines by showing a (2+*ε*)-speed 1-approximation. We complement our positive results for machine minimization by considering the discrete variant of the problem and show that no algorithm can use speed augmentation less than 2^{^}(log^{1-ε}n) and achieve approximation less than (log log n) for any constant*ε*> 0 unless NP admits quasi-polynomial time optimal algorithms. Thus, our results show a stark contrast between the two settings. In one, constant speed augmentation is sufficient whereas in the other, speed augmentation is essentially not effective. - On Uniform Capacitated
*k*-Median beyond the Natural LP Relaxation (arxiv)[Abstract]

*S. Li*

SODA 2015

In this paper, we study the uniform capacitated

*k*-median problem. Obtaining a constant approximation algorithm for this problem is a notorious open problem; most previous works gave constant approximations by either violating the capacity constraint or the cardinality constraint. Notably, all these algorithms are based on the natural LP-relaxation for the problem. The LP-relaxation has unbounded integrality gap, even when we are allowed to violate the capacity constraint or the cardinality constraint by a factor of 2 -*ε*.Our result is an exp(

*O*(1/*ε*^{2}))-approximation algorithm for the problem that violates the cardinality constraint by a factor of 1 +*ε*. That is, we find a solution that opens at most (1 +*ε*)*k*facilities whose cost is at most exp(*O*(1/*ε*^{2})) times the optimum solution when at most*k*facilities can be open. This is already beyond the capability of the natural LP relaxation, as it has unbounded integrality gap even if we are allowed to open (2 -*ε*)*k*facilities. Indeed, our result is based on a novel LP for this problem. We hope that this LP is the first step towards a constant approximation for capacitated*k*-median. - Better Algorithms and Hardness for Broadcast Scheduling via a Discrepancy Approach [Abstract]

*Nikhil Bansal, Moses Charikar, Ravishankar Krishnaswamy and S. Li*

SODA 2014

We study the broadcast scheduling problem with the objective of minimizing the average response time. There is a single server that can hold n pages of unit size, and multiple requests for these pages arrive over time. At each time slot the server can broadcast one page which satisfies all the outstanding requests for this page at that time. The goal is to find a schedule to minimize the average response time of the requests, i.e.~the duration since a request arrives until it is satisfied.

We give an tilde

*O*(log^{1.5}n) approximation algorithm for the problem improving upon the previous tilde*O*(log^{2}n) approximation. We also show an Ω(log^{1/2-ε }n) hardness result, and an integrality gap of Ω(log n) for the natural LP relaxation for the problem. Prior to our work, only NP-Hardness and a (tiny) constant integrality gap was known. These results are based on establishing a close connection to the discrepancy minimization problem for permutation set-systems. Specifically, our improved approximation is based on using recent algorithmic ideas developed for discrepancy minimization. Our integrality gap is obtained from the Ω(log n)-lower bound on the discrepancy of 3-permutations, while our hardness result is based on establishing the first hardness result for the discrepancy of ell-permutations. - A Constant Factor Approximation Algorithm for Fault-Tolerant
*k*-Median (arxiv)[Abstract]

*Mohammadtaghi Hajiaghayi, Wei Hu, Jian Li, S. Li and Barna Saha*

SODA 2014

We consider the fault-tolerant generalization of the classical k-median problem with non-uniform demands, i.e., the demands r

_{j}for different clients may be different. We give the first constant factor approximation algorithm for this problem. Prior to our work, the only known approximation algorithm for the problem achieves a logarithmic approximation ratio and constant factor approximation algorithms are only known for the uniform demand version. We also present the first polynomial time algorithm for the non-uniform fault-tolerant k-median problem on a path or a HST by showing that the corresponding LP always has an integral optimal solution.We also consider the fault-tolerant facility location problem, where each client j needs to be connected to r

_{j}open facilities and the service cost of j is a weighted sum of its distance to the r_{j}facilities. Our result is a constant factor approximation algorithm, generalizing several previous results which only work for nonincreasing weight vectors. Our approximation algorithm is based on LP-rounding, and leverages a different objective function and the*em knapsack covering*constraints. - Capacitated Network Design on Undirected Graphs [Abstract]

*Deeparnab Chakrabarty, Ravishankar Krishnaswamy, S. Li and Srivatsan Narayanan*

APPROX and RANDOM 2013

In this paper, we study the approximability of

*the capacitated network design problem*(Cap-NDP) on*undirected*graphs: Given G = (V,E) with non-negative costs c and capacities u on its edges, source-sink pairs (s_{i},t_{i}) with demand r_{i}, the goal is to find the minimum cost subgraph where the minimum (s_{i},t_{i}) cut with u-capacities is at least r_{i}. When u ≡ 1, we get usual SNDP for which Jain gave a 2-approximation algorithm. Prior to our work, the approximability of undirected Cap-NDP was not well understood even in the single source-sink pair case.An important special case of single source-sink pair Cap-NDP is the

*source location problem*. Given a graph, a collection of sources S, a sink t, and a flow requirement of R, find a set S' ⊆ S of minimum cardinality such that flow(S',t), the maximum flow from S' to t is at least R. We give an O(1)-approximation when flow(s,t) ≈ flow(s',t) for s, s' in S, that is, all sources have similar max-flow values to the sink. To complement the above result, we show when the max-flow values are not similar, the problem is set-cover hard. In fact, the hardness reduction generalizes to show that the single source-sink pair Cap-NDP is label-cover hard in undirected graphs.The main technical ingredient of our algorithmic result is the following theorem which may have other application. Given a capacitated. undirected graph G with a dedicated sink t, call a subset X ⊆ V

*irreducible*if the maximum flow f(X) from X to t is strictly greater than that from any strict subset X' ⊂ X, to t. We prove that for any irreducible set, X, the flow f(X)≥(1/2)Σ_{i ∈ X}f_{i}, where f_{i}is the max-flow from i to t. That is, undirected flows are quasi-additive on irreducible sets. - Approximating
*k*-Median via Pseudo-Approximation (arxiv)[Abstract]

*S. Li and Ola Svensson*

STOC 2013

We present a novel approximation algorithm for k-median that achieves an approximation guarantee of 1+sqrt(3)+

*ε*, improving upon the decade-old ratio of 3+*ε*. Our approach is based on two components, each of which, we believe, is of independent interest. First, we show that in order to give an*α*-approximation algorithm for k-median, it is sufficient to give a*pseudo-approximation algorithm*that finds an*α*-approximate solution by opening k+O(1) facilities. This is a rather surprising result as there exist instances for which opening k+1 facilities may lead to a significant smaller cost than if only k facilities were opened.Second, we give such a pseudo-approximation algorithm with

*α*=1+sqrt(3)+*ε*. Prior to our work, it was not even known whether opening k + o(k) facilities would help improve the approximation ratio. - A Polylogarithmic Approximation for Edge-Disjoint-Paths with Congestion 2 (arxiv)[Abstract]

*Julia Chuzhoy and S. Li*

FOCS 2012

**co-winner of best paper award**

In the Edge-Disjoint Paths with Congestion problem (EDPwC), we are given an undirected n-vertex graph G, a collection M={(s

_{1},t_{1}), ... ,(s_{k},t_{k})} of demand pairs and an integer c. The goal is to connect the maximum possible number of the demand pairs by paths, so that the maximum edge congestion - the number of paths sharing any edge - is bounded by c. When the maximum allowed congestion is c=1, this is the classical Edge-Disjoint Paths problem (EDP).The best current approximation algorithm for EDP achieves an O(sqrt(n))-approximation, by rounding the standard multi-commodity flow relaxation of the problem. This matches the Ω(sqrt n) lower bound on the integrality gap of this relaxation. We show an O(polylog k)-approximation algorithm for EDPwC with congestion c=2, by rounding the same multi-commodity flow relaxation. This gives the best possible congestion for a sub-polynomial approximation of EDPwC via this relaxation. Our results are also close to optimal in terms of the number of pairs routed, since EDPwC is known to be hard to approximate to within a factor of tildeΩ(log n)

^{1/(c+1)}) for any constant congestion c. Prior to our work, the best approximation factor for EDPwC with congestion 2 was tildeO(n^{3/7}), and the best algorithm achieving a polylogarithmic approximation required congestion 14. - A Dependent LP-Rounding Approach for the
*k*-Median Problem (extended)[Abstract]

*Moses Charikar and S. Li*

ICALP 2012

In this paper, we revisit the classical k-median problem. Using the standard LP relaxation for k-median, we give an efficient algorithm to construct a probability distribution on sets of k centers that matches the marginals specified by the optimal LP solution. Analyzing the approximation ratio of our algorithm presents significant technical difficulties: we are able to show an upper bound of 3.25. While this is worse than the current best known 3+

*ε*guarantee of [AGK01], because: (1) it leads to 3.25 approximation algorithms for some generalizations of the k-median problem, including the k-facility location problem introduced in [JV01], (2) our algorithm runs in tilde*O*(k^{3}n^{2}/*δ*^{2}) time to achieve 3.25(1+*δ*)-approximation compared to the O(n^{8}) time required by the local search algorithm of [AGK01] to guarantee a 3.25 approximation, and (3) our approach has the potential to beat the decade old bound of 3+*ε*for k-median.We also give a 34-approximation for the knapsack median problem, which greatly improves the approximation constant in [Kum12]. Using the same technique, we also give a 9-approximation for matroid median problem introduced in [KKN11], improving on their 16-approximation.

- Approximation Algorithms and Hardness of Integral Concurrent Flow (extended)[Abstract]

*Parinya Chalermsook, Julia Chuzhoy, Alina Ene and S. Li*

STOC 2012

We study an integral counterpart of the classical MCF problem, that we call

*Integral Concurrent Flow*(ICF). In the basic version of this problem (basicICF), we are given an undirected n-vertex graph G with edge capacities c(e), a subset T of vertices called terminals, and a demand D(t,t') for every pair (t,t') of the terminals. The goal is to find the maximum value*λ*, and a collection pset of paths, such that every pair (t,t') of terminals is connected by ⌊*λ*D(t,t')⌋ paths in pset, and the number of paths containing any edge e is at most c(e). We show an algorithm that achieves a polylog n-approximation for basicICF, while violating the edge capacities by only a constant factor. We complement this result by proving that no efficient algorithm can achieve a factor*α*-approximation with congestion c for any values*α*,c satisfying*α*c=O(loglog n/logloglog n), unless NP subseteq ZPTIME(n^{poly log n}).We then turn to study the more general group version of the problem (groupICF), in which we are given a collection {(S

_{1},T_{1}), ... ,(S_{k},T_{k})} of pairs of vertex subsets, and for each 1leq ileq k, a demand D_{i}is specified. The goal is to find the maximum value*λ*and a collection pset of paths, such that for each i, at least ⌊*λ*D_{i}⌋ paths connect the vertices of S_{i}to the vertices of T_{i}, while respecting the edge capacities. We show that for any 1 ≤ c ≤ O(loglog n), no efficient algorithm can achieve a factor O(n^{^}((1/2)^{c+3}))-approximation with congestion c for the problem, unless NP subseteq DTIME(n^{O(log log n)}). On the other hand, we show an efficient randomized algorithm that finds polylog n-approximate solutions with constant congestion if we are guaranteed that the optimal solution contains at least D≥ k polylog(n) paths connecting every pair (S_{i},T_{i}). - A 1.488-Approximation Algorithm for the Uncapacitated Facility Location Problem [Abstract]

*S. Li*

ICALP 2011

**best student paper of Track A**

We present a 1.488 approximation algorithm for the metric uncapacitated facility location (UFL) problem. Previously the best algorithm was due to Byrka. By linearly combining two algorithms A1(

*γ*_{f}) for*γ*_{f}approx 1.6774 and the (1.11,1.78)-approximation algorithm A2 proposed by Jain, Mahdian and Saberi, Byrka gave a 1.5 approximation algorithm for the UFL problem. We show that if*γ*_{f}is randomly selected from some distribution, the approximation ratio can be improved to 1.488. Our algorithm cuts the gap with the 1.463 approximability lower bound by almost 1/3. - Vertex Sparsifiers and Abstract Rounding Algorithms (arxiv)[Abstract]

*Moses Charikar, Tom Leighton, S. Li and Ankur Moitra*

FOCS 2010

The notion of vertex sparsification (in particular cut-sparsification) is introduced in, where it was shown that for any graph G = (V, E) and any subset of k terminals K subset V, there is a polynomial time algorithm to construct a graph H = (K, E

_{H})*on just the terminal set*so that simultaneously for all cuts (A, K-A), the value of the minimum cut in G separating A from K - A is approximately the same as the value of the corresponding cut in H. Then approximation algorithms can be run directly on H as a proxy for running on G.We give the first super-constant lower bounds for how well a cut-sparsifier H can simultaneously approximate all minimum cuts in G. We prove a lower bound of Ω(log

^{1/4}k) -- this is polynomially-related to the known upper bound of O(log k/log log k). Independently, a similar lower bound is given in [MM10]. This is an exponential improvement on the Ω(log log k) bound given in [LM10] which in fact was for a stronger vertex sparsification guarantee, and did not apply to cut sparsifiers.Despite this negative result, we show that for many natural optimization problems, we do not need to incur a multiplicative penalty for our reduction. Roughly, we show that any rounding algorithm which also works for the 0-extension relaxation can be used to construct good vertex-sparsifiers for which the optimization problem is easy. Using this, we obtain optimal O(log k)-competitive Steiner oblivious routing schemes. We also demonstrate that for a wide range of graph packing problems (which includes maximum concurrent flow, maximum multiflow and multicast routing, among others, as a special case), the integrality gap of the linear program is always at most O(log k) times the integrality gap restricted to trees. Lastly, we use our ideas to give an efficient construction for vertex-sparsifiers that match the current best existential results -- this was previously open. Our algorithm makes novel use of Earth-mover constraints.

- Capacity of Large Scale Wireless Networks under Gaussian Channel Model

*S. Li, Xiang-Yang Li and YunHao Liu*

Mobicom 08

- A Polylogarithmic Approximation Algorithm for Edge-Disjoint Paths with Congestion 2 [Abstract]

*Julia Chuzhoy and S. Li*

Journal of ACM 63(5), Dec 2016

In the Edge-Disjoint Paths with Congestion problem (EDPwC), we are given an undirected n-vertex graph G, a collection M={(s

_{1},t_{1}), ... ,(s_{k},t_{k})} of demand pairs and an integer c. The goal is to connect the maximum possible number of the demand pairs by paths, so that the maximum edge congestion - the number of paths sharing any edge - is bounded by c. When the maximum allowed congestion is c=1, this is the classical Edge-Disjoint Paths problem (EDP).The best current approximation algorithm for EDP achieves an O(sqrt(n))-approximation, by rounding the standard multi-commodity flow relaxation of the problem. This matches the Ω(sqrt n) lower bound on the integrality gap of this relaxation. We show an O(polylog k)-approximation algorithm for EDPwC with congestion c=2, by rounding the same multi-commodity flow relaxation. This gives the best possible congestion for a sub-polynomial approximation of EDPwC via this relaxation. Our results are also close to optimal in terms of the number of pairs routed, since EDPwC is known to be hard to approximate to within a factor of tildeΩ(log n)

^{1/(c+1)}) for any constant congestion c. Prior to our work, the best approximation factor for EDPwC with congestion 2 was tildeO(n^{3/7}), and the best algorithm achieving a polylogarithmic approximation required congestion 14. - On the Computational Complexity of Minimum-Concave-Cost Flow in a Two-Dimensional Grid [Abstract]

*Shabbir Ahmed, Qie He, S. Li and George Nemhauser*

SIAM Journal on Optimization (to appear)

We study the minimum-concave-cost flow problem on a two-dimensional grid. We characterize the computational complexity of this problem based on the number of rows and columns of the grid, the number of different capacities over all arcs, and the location of sources and sinks. The concave cost over each arc is assumed to be evaluated through an oracle machine, i.e., the oracle machine returns the cost over an arc in a single computational step, given the flow value and the arc index. We propose an algorithm whose running time is polynomial in the number of columns of the grid, for the following cases: (1) the grid has a constant number of rows, a constant number of different capacities over all arcs, and sources and sinks in at most two rows; (2) the grid has two rows and a constant number of different capacities over all arcs connecting rows; (3) the grid has a constant number of rows and all sources in one row, with infinite capacity over each arc. These are presumably the most general polynomially solvable cases, since we show the problem becomes NP-hard when any condition in these cases is removed. Our results apply to abundant variants and generalizations of the dynamic lot sizing model, and answer several questions raised in serial supply chain optimization.

- On Uniform Capacitated
*k*-Median beyond the Natural LP Relaxation [Abstract]

*S. Li*

Transactions on Algorithms 13(2), Jan 2017

In this paper, we study the uniform capacitated

*k*-median problem. Obtaining a constant approximation algorithm for this problem is a notorious open problem; most previous works gave constant approximations by either violating the capacity constraint or the cardinality constraint. Notably, all these algorithms are based on the natural LP-relaxation for the problem. The LP-relaxation has unbounded integrality gap, even when we are allowed to violate the capacity constraint or the cardinality constraint by a factor of 2 -*ε*.Our result is an exp(

*O*(1/*ε*^{2}))-approximation algorithm for the problem that violates the cardinality constraint by a factor of 1 +*ε*. That is, we find a solution that opens at most (1 +*ε*)*k*facilities whose cost is at most exp(*O*(1/*ε*^{2})) times the optimum solution when at most*k*facilities can be open. This is already beyond the capability of the natural LP relaxation, as it has unbounded integrality gap even if we are allowed to open (2 -*ε*)*k*facilities. Indeed, our result is based on a novel LP for this problem. We hope that this LP is the first step towards a constant approximation for capacitated*k*-median. - Approximating
*k*-Median via Pseudo-Approximation [Abstract]

*S. Li and Ola Svensson*

SIAM Journal on Computing 45(2), 2016

**invited to a special issue of SICOMP**

We present a novel approximation algorithm for k-median that achieves an approximation guarantee of 1+sqrt(3)+

*ε*, improving upon the decade-old ratio of 3+*ε*. Our approach is based on two components, each of which, we believe, is of independent interest. First, we show that in order to give an*α*-approximation algorithm for k-median, it is sufficient to give a*pseudo-approximation algorithm*that finds an*α*-approximate solution by opening k+O(1) facilities. This is a rather surprising result as there exist instances for which opening k+1 facilities may lead to a significant smaller cost than if only k facilities were opened.Second, we give such a pseudo-approximation algorithm with

*α*=1+sqrt(3)+*ε*. Prior to our work, it was not even known whether opening k + o(k) facilities would help improve the approximation ratio. - A Constant Factor Approximation Algorithm for Fault-Tolerant
*k*-Median [Abstract]

*Mohammadtaghi Hajiaghayi, Wei Hu, Jian Li, S. Li and Barna Saha*

Transactions on Algorithms 12(3), June 2016

We consider the fault-tolerant generalization of the classical k-median problem with non-uniform demands, i.e., the demands r

_{j}for different clients may be different. We give the first constant factor approximation algorithm for this problem. Prior to our work, the only known approximation algorithm for the problem achieves a logarithmic approximation ratio and constant factor approximation algorithms are only known for the uniform demand version. We also present the first polynomial time algorithm for the non-uniform fault-tolerant k-median problem on a path or a HST by showing that the corresponding LP always has an integral optimal solution.We also consider the fault-tolerant facility location problem, where each client j needs to be connected to r

_{j}open facilities and the service cost of j is a weighted sum of its distance to the r_{j}facilities. Our result is a constant factor approximation algorithm, generalizing several previous results which only work for nonincreasing weight vectors. Our approximation algorithm is based on LP-rounding, and leverages a different objective function and the*em knapsack covering*constraints. - Traffic Congestion in Expanders and (
*p*,*δ*)-Hyperbolic Spaces [Abstract]

*S. Li and Gabriel Tucci*

Internet Mathematics 11(2), 2015

In this paper we define the notion of (p,

*δ*)--Gromov hyperbolic space where we relax Gromov*it slimness*condition to allow that not all but a positive fraction of all the triangles are*δ*--slim. Furthermore, we study their traffic congestion under geodesic routing. We also construct a constant degree family of expanders with congestion Θ(n^{2}) in contrast to random regular graphs that have congestion O(nlog^{3}(n)). - A 1.488-Approximation Algorithm for the Uncapacitated Facility Location Problem [Abstract]

*S. Li*

Information and Computation 222, January 2013

We present a 1.488 approximation algorithm for the metric uncapacitated facility location (UFL) problem. Previously the best algorithm was due to Byrka. By linearly combining two algorithms A1(

*γ*_{f}) for*γ*_{f}approx 1.6774 and the (1.11,1.78)-approximation algorithm A2 proposed by Jain, Mahdian and Saberi, Byrka gave a 1.5 approximation algorithm for the UFL problem. We show that if*γ*_{f}is randomly selected from some distribution, the approximation ratio can be improved to 1.488. Our algorithm cuts the gap with the 1.463 approximability lower bound by almost 1/3.