Welcome to the Networked Systems Lab!

About

Founded in 2002, our laboratory conducts research on the design and implementation of a wide range of networked computing systems.

Recent Papers

  1. MobiSys
    AutoCast: Scalable Infrastructure-less Cooperative Perception for Distributed Collaborative Driving
    Qiu, Hang, Huang, Po-Han, Asavisanu, Namo, Liu, Xiaochen, Psounis, Konstantinos, and Govindan, Ramesh
    In 20th ACM International Conference on Mobile Systems, Applications, and Services (MobiSys 22) 2022
  2. NSDI
    CloudCluster: Unearthing the Functional Structure of a Cloud Service
    Pang, Weiwu, Panda, Sourav, Amjad, Jehangir, Diot, Christophe, and Govindan, Ramesh
    In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22) 2022

    In their quest to provide customers with good tools to manage cloud services, cloud providers are hampered by having very little visibility into cloud service functionality; a provider often only knows where VMs of a service are placed, how the virtual networks are configured, how VMs are provisioned, and how VMs communicate with each other. In this paper, we show that, using the VM-to-VM traffic matrix, we can unearth the functional structure of a cloud service and use it to aid cloud service management. Leveraging the observation that cloud services use well-known design patterns for scaling (e.g., replication, communication locality), we show that clustering the VM-to-VM traffic matrix yields the functional structure of the cloud service. Our clustering algorithm, CloudCluster, must overcome challenges imposed by scale (cloud services contain tens of thousands of VMs) and must be robust to orders-of-magnitude variability in traffic volume and measurement noise. To do this, CloudCluster uses a novel combination of feature scaling, dimensionality reduction, and hierarchical clustering to achieve clustering with over 92% homogeneity and completeness. We show that CloudCluster can be used to explore opportunities to reduce cost for customers, identify anomalous traffic and potential misconfigurations.

  3. SoCC
    Scrooge: A Cost-Effective Deep Learning Inference System
    Hu, Yitao, Ghosh, Rajrup, and Govindan, Ramesh
    In SoCC ’21: Proceedings of the ACM Symposium on Cloud Computing 2021

    Advances in deep learning (DL) have prompted the development of cloud-hosted DL-based media applications that process video and audio streams in real-time. Such applications must satisfy throughput and latency objectives and adapt to novel types of dynamics, while incurring minimal cost. Scrooge, a system that provides media applications as a service, achieves these objectives by packing computations efficiently into GPU-equipped cloud VMs, using an optimization formulation to find the lowest cost VM allocations that meet the performance objectives, and rapidly reacting to variations in input complexity (e.g., changes in participants in a video). Experiments show that Scrooge can save serving cost by 16-32% (which translate to tens of thousands of dollars per year) relative to the state-of-the-art while achieving latency objectives for over 98% under dynamic workloads.

  4. TMC
    Synthesis of Large-Scale Instant IoT Networks
    Ghosh, Pradipta, Bunton, Jonathan, Pylorof, Dimitrios, Vieira, Marcos A. M., Chan, Kevin, Govindan, Ramesh, Sukhatme, Gaurav S., Tabuada, Paulo, and Verma, Gunjan
    IEEE Transactions on Mobile Computing 2021

    While most networks have long lifetimes, temporary network infrastructure is often useful for special events, pop-up retail, or disaster response. An instant IoT network is one that is rapidly constructed, used for a few days, then dismantled. We consider the synthesis of instant IoT networks in urban settings. This synthesis problem must satisfy complex and competing constraints: sensor coverage, line-of-sight visibility, and network connectivity. The central challenge in our synthesis problem is quickly scaling to large regions while producing cost-effective solutions. We explore two qualitatively different representations of the synthesis problems using satisfiability modulo convex optimization (SMC), and mixed-integer linear programming (MILP). The former is more expressive, for our problem, than the latter, but is less well-suited for solving optimization problems like ours. We show how to express our network synthesis in these frameworks. To scale to problem sizes beyond what these frameworks are capable of, we develop a hierarchical synthesis technique that independently synthesizes networks in sub-regions of the deployment area, then combines these. We find that, while MILP outperforms SMC in some settings for smaller problem sizes, the fact that SMC’s expressivity matches our problem ensures that it uniformly generates better quality solutions at larger problem sizes.

  5. SIGCOMM
    A Throughput-Centric View of the Performance of Datacenter Topologies
    Namyar, Pooria, Supittayapornpong, Sucha, Zhang, Mingyang, Yu, Minlan, and Govindan, Ramesh
    In Proceedings of the 2021 ACM SIGCOMM 2021 Conference 2021

    While prior work has explored many proposed datacenter designs, only two designs, Clos-based and expander-based, are generally considered practical because they can scale using commodity switching chips. Prior work has used two different metrics, bisection bandwidth and throughput, for evaluating these topologies at scale. Little is known, theoretically or practically, how these metrics relate to each other. Exploiting characteristics of these topologies, we prove an upper bound on their throughput, then show that this upper bound better estimates worst-case throughput than all previously proposed throughput estimators and scales better than most of them. Using this upper bound, we show that for expander-based topologies, unlike Clos, beyond a certain size of the network, no topology can have full throughput, even if it has full bisection bandwidth; in fact, even relatively small expander-based topologies fail to achieve full throughput. We conclude by showing that using throughput to evaluate datacenter performance instead of bisection bandwidth can alter conclusions in prior work about datacenter cost, manageability, and reliability.

News

May 4 2022
Sarah Cooney accepts position at Villanova University. Congrats!

April 20 2022
Fawad Ahmad accepts position at Rochester Institute of Technology. Congrats!

April 1 2022
CloudCluster accepted to NSDI 2022

March 20 2022
Autocast accepted to MobiSys 2022

December 1 2021
Mingyang Zhang joins Google NetInfra Team! Congrats!

... see all News