I am currently on the job market.
[CV], [Research Statement], [Teaching Statement]
I am a Ph.D. student in the ECE department at the University of Southern California. I am currently a member of Networked Systems Lab and very fortunate to be advised by Prof. Ramesh Govindan. My research is in the intersection of theory, systems, and machine learning. I focus on studying complex systems and enhancing their performance and availability at scale. Toward this goal, I develop (1) practical, scalable algorithms and optimization methods with formal guarantees and (2) systems and methods to identify, explain, and effectively address performance gaps in both handcrafted and learning-enabled heuristics.
Before joining USC, I completed my B.Sc. in Electrical Engineering at the Sharif University of Technology in 2019. During my undergraduate, I worked on the cloudification of telecommunication networks by leveraging SDN and NFV.
Feel free to drop me an email if you have any questions or want to discuss new ideas!
Graduate Research Assistant (Aug 2019 - present)
University of Southern California, Los Angeles, CA
Research Intern (Sep 2023 - present)
Google, Sunnyvale, CA
Mentors: Devdeep Ray, Yuliang Li, KK Yap, Nandita Dukkipati
Research Intern & Student Researcher (Jun 2022 - Jan 2023)
Microsoft Research, Redmond, WA
Mentors: Behnaz Arzani, Ryan Beckett, Srikanth Kandula
Research Intern & Student Researcher (Jun 2021 - Jan 2022)
Microsoft Research, Redmond, WA
Mentors: Behnaz Arzani, Dan Crankshaw, Srikanth Kandula
We consider the max-min fair resource allocation problem. The best-known solutions use either a sequence of optimizations or waterfilling, which only applies to a narrow set of cases. These solutions have become a practical bottleneck in WAN traffic engineering and cluster scheduling, especially at larger problem sizes. We improve both approaches: (1) we show how to convert the optimization sequence into a single fast optimization, and (2) we generalize waterfilling to the multi-path case. We empirically show our new algorithms Pareto-dominate prior techniques: they produce faster, fairer, and more efficient allocations. Some of our allocators also have theoretical guarantees: they trade off a bounded amount of unfairness for faster allocation. We have deployed our allocators in Azure’s WAN traffic engineering pipeline, where we preserve solution quality and achieve a roughly 3× speedup.
Production systems use heuristics because they are faster or scale better than their optimal counterparts. Yet, practitioners are often unaware of the performance gap between a heuristic and the optimum or between two heuristics in realistic scenarios. We present MetaOpt, a system that helps analyze heuristics. Users specify the heuristic and the optimal (or another heuristic) as input, and MetaOpt automatically encodes these efficiently for a solver to find performance gaps and their corresponding adversarial inputs. Its suite of built-in optimizations helps it scale its analysis to practical problem sizes. To show it is versatile, we used MetaOpt to analyze heuristics from three domains (traffic engineering, vector bin packing, and packet scheduling). We found a production traffic engineering heuristic can require 30% more capacity than the optimal to satisfy realistic demands. Based on the patterns in the adversarial inputs MetaOpt produced, we modified the heuristic to reduce its performance gap by 12.5×. We examined adversarial inputs to a vector bin packing heuristic and proved a new lower bound on its performance.
While prior work has explored many proposed datacenter designs, only two designs, Clos-based and expander-based, are generally considered practical because they can scale using commodity switching chips. Prior work has used two different metrics, bisection bandwidth and throughput, for evaluating these topologies at scale. Little is known, theoretically or practically, how these metrics relate to each other. Exploiting characteristics of these topologies, we prove an upper bound on their throughput, then show that this upper bound better estimates worst-case throughput than all previously proposed throughput estimators and scales better than most of them. Using this upper bound, we show that for expander-based topologies, unlike Clos, beyond a certain size of the network, no topology can have full throughput, even if it has full bisection bandwidth; in fact, even relatively small expander-based topologies fail to achieve full throughput. We conclude by showing that using throughput to evaluate datacenter performance instead of bisection bandwidth can alter conclusions in prior work about datacenter cost, manageability, and reliability.
Cloud providers install mitigations to reduce the impact of network failures within their datacenters. Existing network mitigation systems rely on simple local criteria or global proxy metrics to determine the best action. In this paper, we show that we can support a broader range of actions and select more effective mitigations by directly optimizing end-to-end flow-level metrics and analyzing actions holistically. To achieve this, we develop novel techniques to quickly estimate the impact of different mitigations and rank them with high fidelity. Our results on incidents from a large cloud provider show orders of magnitude improvements in flow completion time and throughput. We also show our approach scales to large datacenters.
Operators can deploy any scheduler they desire on existing switches through programmable packet schedulers: they tag packets with ranks (which indicate their priority) and schedule them in the order of these ranks. The ideal programmable scheduler is the Push-In First-Out (PIFO) queue, which schedules packets in a perfectly sorted order by “pushing” packets into any position of the queue based on their ranks. However, it is hard to implement PIFO queues in hardware due to their need to sort packets at line rate (based on their ranks). Recent proposals approximate PIFO behaviors on existing data-planes. While promising, they fail to simultaneously capture both of the necessary behaviors of PIFO queues: their scheduling behavior and admission control. We introduce PACKS, an approximate PIFO scheduler that addresses this problem. PACKS runs on top of a set of priority queues and uses packet-rank information and queue-occupancy levels during enqueue to determine whether to admit each incoming packet and to which queue it should be mapped. We fully implement PACKS in P4 and evaluate it on real workloads. We show that PACKS better approximates PIFO than state-of-the-art approaches. Specifically, PACKS reduces the rank inversions by up to 7× and 15× with respect to SP-PIFO and AIFO, and the number of packet drops by up to 60% compared to SP-PIFO. Under pFabric ranks, PACKS reduces the mean FCT across small flows by up to 33% and 2.6×, compared to SP-PIFO and AIFO. We also show that PACKS runs at line rate on existing hardware (Intel Tofino).
We propose a performance analysis tool for learning-enabled systems that allows operators to uncover potential performance issues before deploying DNNs in their systems. The tools that exist for this purpose require operators to faithfully model all components (a white-box approach) or do inefficient black-box local search. We propose a gray-box alternative, which eliminates the need to precisely model all the system’s components. Our approach is faster and finds substantially worse scenarios compared to prior work. We show that a state-of-the-art learning-enabled traffic engineering pipeline can underperform the optimal by 6\texttimes — a much higher number compared to what the authors found.
Many problems that cloud operators solve are computationally expensive, and operators often use heuristic algorithms (that are faster and scale better than optimal) to solve them more efficiently. Heuristic analyzers enable operators to find when and by how much their heuristics underperform. However, these tools do not provide enough detail for operators to mitigate the heuristic’s impact in practice: they only discover a single input instance that causes the heuristic to underperform (and not the full set) and they do not explain why.We propose XPlain, a tool that extends these analyzers and helps operators understand when and why their heuristics underperform. We present promising initial results that show such an extension is viable.
Oblivious routing distributes traffic from sources to destinations following predefined routes with rules independent of traffic demands. While finding optimal oblivious routing with a concave objective is intractable for general topologies, we show that it is tractable for structured topologies often used in datacenter networks. To achieve this, we apply graph automorphism and prove the existence of the optimal automorphism-invariant solution. This result reduces the search space to targeting the optimal automorphism-invariant solution. We design an iterative algorithm to obtain such a solution by alternating between convex optimization and a linear program. The convex optimization finds an automorphism-invariant solution based on representative variables and constraints, making the problem tractable. The linear program generates adversarial demands to ensure the final result satisfies all possible demands. Since the construction of the representative variables and constraints are combinatorial problems, we design polynomial-time algorithms for the construction. We evaluate the iterative algorithm in terms of throughput performance, scalability, and generality over three potential applications. The algorithm i) improves the throughput up to 87.5% for partially deployed FatTree and achieves up to 2.55\times throughput gain for DRing over heuristic algorithms, ii) scales for three considered topologies with a thousand switches, iii) applies to a general structured topology with non-uniform link capacity and server distribution.
Production systems use heuristics because they are faster or scale better than the corresponding optimal algorithms. Yet, practitioners are often unaware of how worse off a heuristic’s solution may be with respect to the optimum in realistic scenarios. Leveraging two-stage games and convex optimization, we present a provable framework that unveils settings where a given heuristic underperforms.
Oblivious routing distributes traffic from sources to destinations following predefined routes with rules independent of traffic demands. While finding optimal oblivious routing is intractable for general topologies, we show that it is tractable for structured topologies often used in datacenter networks. To achieve this, we apply graph automorphism and prove the existence of the optimal automorphism-invariant solution. This result reduces the search space to targeting the optimal automorphism-invariant solution. We design an iterative algorithm to obtain such a solution by alternating between two linear programs. The first program finds an automorphism-invariant solution based on representative variables and constraints, making the problem tractable. The second program generates adversarial demands to ensure the final result satisfies all possible demands. Since, the construction of the representative variables and constraints are combinatorial problems, we design polynomial-time algorithms for the construction. We evaluate proposed iterative algorithm in terms of throughput performance, scalability, and generality over three potential applications. The algorithm i) improves the throughput up to 87.5% over a heuristic algorithm for partially deployed FatTree, ii) scales for FatClique with a thousand switches, iii) is applicable to a general structured topology with non-uniform link capacity and server distribution.