For publications prior to 2011, see DBLP.
On-vehicle 3D sensing technologies, such as LiDARs and stereo cameras, enable a novel capability, 3D traffic reconstruction. This produces a volumetric video consisting of a sequence of 3D frames capturing the time evolution of road traffic. 3D traffic reconstruction can help trained investigators reconstruct the scene of an accident. In this paper, we describe the design and implementation of RECAP, a system that continuously and opportunistically produces 3D traffic reconstructions from multiple vehicles. RECAP builds upon prior work on point cloud registration, but adapts it to settings with minimal point cloud overlap (both in the spatial and temporal sense) and develops techniques to minimize error and computation time in multi-way registration. On-road experiments and trace-driven simulations show that RECAP can, within minutes, generate highly accurate reconstructions that have 2× or more lower errors than competing approaches.
We propose a performance analysis tool for learning-enabled systems that allows operators to uncover potential performance issues before deploying DNNs in their systems. The tools that exist for this purpose require operators to faithfully model all components (a white-box approach) or do inefficient black-box local search. We propose a gray-box alternative, which eliminates the need to precisely model all the system’s components. Our approach is faster and finds substantially worse scenarios compared to prior work. We show that a state-of-the-art learning-enabled traffic engineering pipeline can underperform the optimal by 6\texttimes — a much higher number compared to what the authors found.
Many problems that cloud operators solve are computationally expensive, and operators often use heuristic algorithms (that are faster and scale better than optimal) to solve them more efficiently. Heuristic analyzers enable operators to find when and by how much their heuristics underperform. However, these tools do not provide enough detail for operators to mitigate the heuristic’s impact in practice: they only discover a single input instance that causes the heuristic to underperform (and not the full set) and they do not explain why.We propose XPlain, a tool that extends these analyzers and helps operators understand when and why their heuristics underperform. We present promising initial results that show such an extension is viable.
Production systems use heuristics because they are faster or scale better than their optimal counterparts. Yet, practitioners are often unaware of the performance gap between a heuristic and the optimum or between two heuristics in realistic scenarios. We present MetaOpt, a system that helps analyze heuristics. Users specify the heuristic and the optimal (or another heuristic) as input, and MetaOpt automatically encodes these efficiently for a solver to find performance gaps and their corresponding adversarial inputs. Its suite of built-in optimizations helps it scale its analysis to practical problem sizes. To show it is versatile, we used MetaOpt to analyze heuristics from three domains (traffic engineering, vector bin packing, and packet scheduling). We found a production traffic engineering heuristic can require 30% more capacity than the optimal to satisfy realistic demands. Based on the patterns in the adversarial inputs MetaOpt produced, we modified the heuristic to reduce its performance gap by 12.5×. We examined adversarial inputs to a vector bin packing heuristic and proved a new lower bound on its performance.
We consider the max-min fair resource allocation problem. The best-known solutions use either a sequence of optimizations or waterfilling, which only applies to a narrow set of cases. These solutions have become a practical bottleneck in WAN traffic engineering and cluster scheduling, especially at larger problem sizes. We improve both approaches: (1) we show how to convert the optimization sequence into a single fast optimization, and (2) we generalize waterfilling to the multi-path case. We empirically show our new algorithms Pareto-dominate prior techniques: they produce faster, fairer, and more efficient allocations. Some of our allocators also have theoretical guarantees: they trade off a bounded amount of unfairness for faster allocation. We have deployed our allocators in Azure’s WAN traffic engineering pipeline, where we preserve solution quality and achieve a roughly 3× speedup.
When streaming 360° video, it is possible to reduce bandwidth by 5\texttimes with approaches that spatially segment video into tiles and only stream the user’s viewport. Unfortunately, it is difficult to accurately predict a user’s viewport even 2–3 seconds before playback. This results in rebuffering events owing to misprediction of a user’s viewport or network bandwidth dips, which hurts interactive experience. However, avoiding rebuffering by naively skipping tiles that do not arrive by the playback deadline may lead to incomplete viewports and degraded experience.In this paper, we describe Dragonfly, a new 360° system that preserves interactive experience by avoiding playback stalls while maintaining high perceptual quality. Dragonfly prudently skips tiles using a model that defines an overall utility function to decide which tiles to fetch, and at which qualities they should be fetched, with the goal of optimizing user experience. To minimize incomplete viewports, it also fetches a low quality masking stream. Using a user study with 26 users and emulation-based experiments we show that Dragonfly has higher quality, and lower overheads, than state-of-the-art 360° streaming approaches. For instance, in our study, 65% of sessions have a rating of 4 or higher (Good/Excellent) with Dragonfly, while only 16% of sessions with Pano, and 13% of sessions with Flare achieve this rating.
This paper presents AeroTraj, a system that enables fast, accurate, and automated reconstruction of 3D models of large buildings using a drone-mounted LiDAR. LiDAR point clouds can be used directly to assemble 3D models if their positions are accurately determined. AeroTraj uses SLAM for this, but must ensure complete and accurate reconstruction while minimizing drone battery usage. Doing this requires balancing competing constraints: drone speed, height, and orientation. AeroTraj exploits building geometry in designing an optimal trajectory that incorporates these constraints. Even with an optimal trajectory, SLAM’s position error can drift over time, so AeroTraj tracks drift in-flight by offloading computations to the cloud and invokes a re-calibration procedure to minimize error. AeroTraj can reconstruct large structures with centimeter-level accuracy and with an average end-to-end latency below 250 ms, significantly outperforming the state of the art.
With growing deployment of machine learning (ML) models, ML developers are training or re-training increasingly more deep neural networks (DNNs). They do so to find the most suitable model that meets their accuracy requirement while satisfying the resource and timeliness constraints of the target environment. In large shared clusters, the growing number of neural architecture search (NAS) and training jobs often result in models sharing architectural similarities with others from the same or a different ML developer. However, existing solutions do not provide a systematic mechanism to identify and leverage such similarities. We present ModelKeeper, the first automated training warmup system that accelerates DNN training by repurposing previously-trained models in a shared cluster. Our key insight is that initializing a training job’s model by transforming an already-trained model’s weights can jump-start it and reduce the total amount of training needed. However, models submitted over time can differ in their architectures and accuracy. Given a new model to train, ModelKeeper scalably identifies its architectural similarity with previously trained models, selects a parent model with high similarity and good model accuracy, and performs structure-aware transformation of weights to preserve maximal information from the parent model during the warmup of new model weights. Our evaluations across thousands of CV and NLP models show that ModelKeeper achieves 1.3×–4.3× faster training completion with little overhead and no reduction in model accuracy.
Oblivious routing distributes traffic from sources to destinations following predefined routes with rules independent of traffic demands. While finding optimal oblivious routing with a concave objective is intractable for general topologies, we show that it is tractable for structured topologies often used in datacenter networks. To achieve this, we apply graph automorphism and prove the existence of the optimal automorphism-invariant solution. This result reduces the search space to targeting the optimal automorphism-invariant solution. We design an iterative algorithm to obtain such a solution by alternating between convex optimization and a linear program. The convex optimization finds an automorphism-invariant solution based on representative variables and constraints, making the problem tractable. The linear program generates adversarial demands to ensure the final result satisfies all possible demands. Since the construction of the representative variables and constraints are combinatorial problems, we design polynomial-time algorithms for the construction. We evaluate the iterative algorithm in terms of throughput performance, scalability, and generality over three potential applications. The algorithm i) improves the throughput up to 87.5% for partially deployed FatTree and achieves up to 2.55\times throughput gain for DRing over heuristic algorithms, ii) scales for three considered topologies with a thousand switches, iii) applies to a general structured topology with non-uniform link capacity and server distribution.
It is common for the authors of a web page to include links to related pages on other sites. However, when users visit a page several years after it was last updated, they often find that some of the external links either do not work or point to unrelated content. To combat these problems of link rot and content drift, the solution used today is to capture a copy of the linked page when a link is created and serve this copy to users who choose to visit the link. We argue that this status quo ignores the reality that one does not always link to a page in order to point visitors to the content that existed on that page when the link was created. The utility of linking to a web page by simply directing users to that page’s URL is that they can benefit from any updates to the page’s content (e.g., corrections to news articles and new comments on a blog post) or access rich app-like functionality on the page (e.g., search). In this paper, we present a sketch of what it would take to make web links resilient while accounting for the dynamism of web pages.
Network Functions (NFs) now touch a significant fraction of Internet traffic. The hope has been that software-based NF Virtualization (NFV) would enable rapid development of new NFs by vendors and leverage the power and economics of commodity computing infrastructure for NF deployment. To date, no cloud NFV systems achieve NF chaining, isolation, SLO-adherence, and scaling together with existing cloud computing infrastructure and abstractions, all while achieving generality, speed, and ease of deployment; these properties are taken for granted in other cloud contexts but unavailable for NF processing. We present Quadrant, an efficient and secure cloud-deployable NFV system, and show that Quadrant’s approach of adapting existing cloud infrastructure to support packet processing can achieve NF chaining, isolation, generality, and performance in NFV. Quadrant reuses common cloud infrastructure such as Kubernetes, cloud functions, the Linux kernel, NIC hardware, and switches. It enables easy NFV deployment while delivering up to double the performance per core compared to the state of the art.
It is common for a web page to include links which help visitors discover related pages on other sites. When a link ceases to work (e.g., because the page that it is pointing to either no longer exists or has been moved), users could rely on an archived copy of the linked page. However, due to the incompleteness of web archives, a sizeable fraction of dead links have no archived copies. We study this problem in the context of Wikipedia. Broken external references on Wikipedia which lack archived copies are marked as "permanently dead". But, we find this term to be a misnomer, as many previously dysfunctional links work fine today. For links which do not work, it is rarely the case that no archived copies exist. Instead, we find that the current policy for determining which archived copies for an URL are not erroneous is too conservative, and many URLs are archived for the first time only after they no longer work. We discuss the implications of our findings for Wikipedia and the web at large.
We present FedScale, a federated learning (FL) benchmarking suite with realistic datasets and a scalable runtime to enable reproducible FL research. FedScale datasets encompass a wide range of critical FL tasks, ranging from image classification and object detection to language modeling and speech recognition. Each dataset comes with a unified evaluation protocol using real-world data splits and evaluation metrics. To reproduce realistic FL behavior, FedScale contains a scalable and extensible runtime. It provides high-level APIs to implement FL algorithms, deploy them at scale across diverse hardware and software backends, and evaluate them at scale, all with minimal developer efforts. We combine the two to perform systematic benchmarking experiments and highlight potential opportunities for heterogeneity-aware co-optimizations in FL. FedScale is open-source and actively maintained by contributors from different institutions at http://fedscale. ai. We welcome feedback and contributions from the community.
By repeatedly crawling and saving web pages over time, web archives (such as the Internet Archive) enable users to visit historical versions of any page. In this paper, we point out that existing web archives are not well designed to cope with the widespread presence of JavaScript on the web. Some archives store petabytes of JavaScript code, and yet many pages render incorrectly when users load them. Other archives which store the end-state of page loads (e.g., screen captures) break post-load interactions implemented in JavaScript. To address these problems, we present Jawa, a new design for web archives which significantly reduces the storage necessary to save modern web pages while also improving the fidelity with which archived pages are served. Key to enabling Jawa’s use at scale are our observations on a) the forms of non-determinism which impair the execution of JavaScript on archived pages, and b) the ways in which JavaScript’s execution fundamentally differs between live web pages and their archived copies. On a corpus of 1 million archived pages, Jawa reduces overall storage needs by 41%, when compared to the techniques currently used by the Internet Archive.
Autonomous vehicles use 3D sensors for perception. Cooperative perception enables vehicles to share sensor readings with each other to improve safety. Prior work in cooperative perception scales poorly even with infrastructure support. AUTOCAST1 enables scalable infrastructure-less cooperative perception using direct vehicle-to-vehicle communication. It carefully determines which objects to share based on positional relationships between traffic participants, and the time evolution of their trajectories. It coordinates vehicles and optimally schedules transmissions in a distributed fashion. Extensive evaluation results under different scenarios show that, unlike competing approaches, AUTOCAST can avoid crashes and near-misses which occur frequently without cooperative perception, its performance scales gracefully in dense traffic scenarios providing 2-4x visibility into safety critical objects compared to existing cooperative perception schemes, its transmission schedules can be completed on the real radio testbed, and its scheduling algorithm is near-optimal with negligible computation overhead.
Oblivious routing distributes traffic from sources to destinations following predefined routes with rules independent of traffic demands. While finding optimal oblivious routing is intractable for general topologies, we show that it is tractable for structured topologies often used in datacenter networks. To achieve this, we apply graph automorphism and prove the existence of the optimal automorphism-invariant solution. This result reduces the search space to targeting the optimal automorphism-invariant solution. We design an iterative algorithm to obtain such a solution by alternating between two linear programs. The first program finds an automorphism-invariant solution based on representative variables and constraints, making the problem tractable. The second program generates adversarial demands to ensure the final result satisfies all possible demands. Since, the construction of the representative variables and constraints are combinatorial problems, we design polynomial-time algorithms for the construction. We evaluate proposed iterative algorithm in terms of throughput performance, scalability, and generality over three potential applications. The algorithm i) improves the throughput up to 87.5% over a heuristic algorithm for partially deployed FatTree, ii) scales for FatClique with a thousand switches, iii) is applicable to a general structured topology with non-uniform link capacity and server distribution.
In their quest to provide customers with good tools to manage cloud services, cloud providers are hampered by having very little visibility into cloud service functionality; a provider often only knows where VMs of a service are placed, how the virtual networks are configured, how VMs are provisioned, and how VMs communicate with each other. In this paper, we show that, using the VM-to-VM traffic matrix, we can unearth the functional structure of a cloud service and use it to aid cloud service management. Leveraging the observation that cloud services use well-known design patterns for scaling (e.g., replication, communication locality), we show that clustering the VM-to-VM traffic matrix yields the functional structure of the cloud service. Our clustering algorithm, CloudCluster, must overcome challenges imposed by scale (cloud services contain tens of thousands of VMs) and must be robust to orders-of-magnitude variability in traffic volume and measurement noise. To do this, CloudCluster uses a novel combination of feature scaling, dimensionality reduction, and hierarchical clustering to achieve clustering with over 92% homogeneity and completeness. We show that CloudCluster can be used to explore opportunities to reduce cost for customers, identify anomalous traffic and potential misconfigurations.
Advances in deep learning (DL) have prompted the development of cloud-hosted DL-based media applications that process video and audio streams in real-time. Such applications must satisfy throughput and latency objectives and adapt to novel types of dynamics, while incurring minimal cost. Scrooge, a system that provides media applications as a service, achieves these objectives by packing computations efficiently into GPU-equipped cloud VMs, using an optimization formulation to find the lowest cost VM allocations that meet the performance objectives, and rapidly reacting to variations in input complexity (e.g., changes in participants in a video). Experiments show that Scrooge can save serving cost by 16-32% (which translate to tens of thousands of dollars per year) relative to the state-of-the-art while achieving latency objectives for over 98% under dynamic workloads.
While most networks have long lifetimes, temporary network infrastructure is often useful for special events, pop-up retail, or disaster response. An instant IoT network is one that is rapidly constructed, used for a few days, then dismantled. We consider the synthesis of instant IoT networks in urban settings. This synthesis problem must satisfy complex and competing constraints: sensor coverage, line-of-sight visibility, and network connectivity. The central challenge in our synthesis problem is quickly scaling to large regions while producing cost-effective solutions. We explore two qualitatively different representations of the synthesis problems using satisfiability modulo convex optimization (SMC), and mixed-integer linear programming (MILP). The former is more expressive, for our problem, than the latter, but is less well-suited for solving optimization problems like ours. We show how to express our network synthesis in these frameworks. To scale to problem sizes beyond what these frameworks are capable of, we develop a hierarchical synthesis technique that independently synthesizes networks in sub-regions of the deployment area, then combines these. We find that, while MILP outperforms SMC in some settings for smaller problem sizes, the fact that SMC’s expressivity matches our problem ensures that it uniformly generates better quality solutions at larger problem sizes.
While prior work has explored many proposed datacenter designs, only two designs, Clos-based and expander-based, are generally considered practical because they can scale using commodity switching chips. Prior work has used two different metrics, bisection bandwidth and throughput, for evaluating these topologies at scale. Little is known, theoretically or practically, how these metrics relate to each other. Exploiting characteristics of these topologies, we prove an upper bound on their throughput, then show that this upper bound better estimates worst-case throughput than all previously proposed throughput estimators and scales better than most of them. Using this upper bound, we show that for expander-based topologies, unlike Clos, beyond a certain size of the network, no topology can have full throughput, even if it has full bisection bandwidth; in fact, even relatively small expander-based topologies fail to achieve full throughput. We conclude by showing that using throughput to evaluate datacenter performance instead of bisection bandwidth can alter conclusions in prior work about datacenter cost, manageability, and reliability.
For decades, drafting Internet protocols has taken significant amounts of human supervision due to the fundamental ambiguity of natural language. Given such ambiguity, it is also not surprising that protocol implementations have long exhibited bugs. This pain and overhead can be significantly reduced with the help of natural language processing (NLP).We recently applied NLP to identify ambiguous or under-specified sentences in RFCs, and to generate protocol implementations automatically when the ambiguity is clarified. However this system is far from general or deployable. To further reduce the overhead and errors due to ambiguous sentences, and to improve the generality of this system, much work remains to be done. In this paper, we consider what it would take to produce a fully-general and useful system for easing the natural-language challenges in the RFC process.
Software is often used for Network Functions (NFs) – such as firewalls, NAT, deep packet inspection, and encryption – that are applied to traffic in the network. The community has hoped that NFV would enable rapid development of new NFs and leverage commodity computing infrastructure. However, the challenge for researchers and operators has been to align the square peg of high-speed packet processing with the round hole of cloud computing infrastructures and abstractions, all while delivering performance, scalability, and isolation. Past work has led to the belief that NFV is different enough that it requires novel, custom approaches that deviate from today’s norms. To the contrary, we show that we can achieve performance, scalability, and isolation in NFV judiciously using mechanisms and abstractions of FaaS, the Linux kernel, NIC hardware, and OpenFlow switches. As such, with our system Galleon, NFV can be practically-deployable today in conventional cloud environments while delivering up to double the performance per core compared to the state of the art.
For decades, Internet protocols have been specified using natural language. Given the ambiguity inherent in such text, it is not surprising that protocol implementations have long exhibited bugs. In this paper, we apply natural language processing (NLP) to effect semi-automated generation of protocol implementations from specification text. Our system, Sage, can uncover ambiguous or under-specified sentences in specifications; once these are clarified by the author of the protocol specification, Sage can generate protocol code automatically.Using Sage, we discover 5 instances of ambiguity and 6 instances of under-specification in the ICMP RFC; after fixing these, Sage is able to automatically generate code that interoperates perfectly with Linux implementations. We show that Sage generalizes to sections of BFD, IGMP, and NTP and identify additional conceptual components that Sage needs to support to generalize to complete, complex protocols like BGP and TCP.
Network Functions (NFs) perform on-path processing of network traffic. ISPs are deploying NF Virtualization (NFV) with software NFs run on commodity servers. ISPs aim to ensure that NF chains, directed acyclic graphs of NFs, do not violate Service Level Objectives (SLOs) promised by the ISP to its customers. To meet SLOs, NFV systems sometimes leverage on-path hardware (such as programmable switches and smart NICs) to accelerate NF execution.Lemur places and executes NF chains across heterogeneous hardware while meeting SLOs. Lemur’s novel placement algorithm yields an SLO-satisfying NF placement while weighing many constraints: hardware memory and processing stages, server cores, link capacity, NF profiles, and NF chain interactions. Lemur’s metacompiler automatically generates code and rules (in P4, Python, eBPF, C++, and OpenFlow) to stitch cross-platform NF chain execution while also optimizing resource usage. Our experiments show that Lemur is alone among competing strategies in meeting SLOs for canonical NF chains while maximizing marginal throughput (the traffic rate in excess of the service-level objective).
The performance and availability of cloud and content providers often depends on the wide area networks (WANs) they use to interconnect their datacenters. WAN routers, which connect to each other using trunks (bundles of links), are sometimes built using an internal Clos topology connecting merchant-silicon switches. As such, these routers are susceptible to internal link and switch failures, resulting in reduced capacity and low availability. Based on the observation that today’s WAN routers use relatively simple trunk wiring and routing techniques, we explore the design of novel wiring and more sophisticated routing techniques to increase failure resilience. Specifically, we describe techniques to 1) optimize trunk wiring to increase effective internal router capacity so as to be resilient to internal failures, 2) compute the effective capacity under different failure patterns, and 3) use these to compute compact routing tables under different failure patterns, since switches have limited routing table sizes. Our evaluations show that our approach can mask failures of up to 75% of switches in some cases without exceeding routing table limits, whereas competing techniques can sometimes lose half of a WAN router’s capacity with a single failure.
An upcoming frontier for distributed computing might literally save lives in future military operations. In civilian scenarios, significant efficiencies were gained from interconnecting devices into networked services and applications that automate much of everyday life from smart homes to intelligent transportation. The ecosystem of such applications and services is collectively called the Internet of Things (IoT). Can similar benefits be gained in a military context by developing an IoT for the battlefield? This paper describes unique challenges in such a context as well as potential risks, mitigation strategies, and benefits.
Running data-parallel jobs across geo-distributed sites has emerged as a promising direction due to the growing need for geo-distributed cluster deployment. A key difference between geo-distributed and intra-cluster jobs is the heterogeneous (and often constrained) nature of compute and network resources across the sites. We propose Tetrium, a system for multi-resource allocation in geo-distributed clusters, that jointly considers both compute and network resources for task placement and job scheduling. Tetrium significantly reduces job response time, while incorporating several other performance goals with simple control knobs. Our EC2 deployment and trace-driven simulations suggest that Tetrium improves the average job response time by up to 78% compared to existing data-locality-based solutions, and up to 55% compared to Iridium, the recently proposed geo-distributed analytics system.
Autonomous vehicle prototypes today come with line-of-sight depth perception sensors like 3D cameras. These 3D sensors are used for improving vehicular safety in autonomous driving, but have fundamentally limited visibility due to occlusions, sensing range, and extreme weather and lighting conditions. To improve visibility and performance, we explore a capability called Augmented Vehicular Reality (AVR). AVR broadens the vehicle’s visual horizon by enabling it to wirelessly share visual information with other nearby vehicles. We show that AVR is feasible using off-the-shelf wireless technologies, and it can qualitatively change the decisions made by autonomous vehicle path planning algorithms. Our AVR prototype achieves positioning accuracies that are within a few percentages of car lengths and lane widths, and it is optimized to process frames at 30fps.
Like today’s autonomous vehicle prototypes, vehicles in the future will have rich sensors to map and identify objects in the environment. For example, many autonomous vehicle prototypes today come with line-of-sight depth perception sensors like 3D cameras. These cameras are used for improving vehicular safety in autonomous driving, but have fundamentally limited visibility due to occlusions, sensing range, and extreme weather and lighting conditions. To improve visibility and performance, not just for autonomous vehicles but for other Advanced Driving Assistance Systems (ADAS), we explore a capability called Augmented Vehicular Reality (AVR). AVR broadens the vehicle’s visual horizon by enabling it to share visual information with other nearby vehicles, but requires careful techniques to align coordinate frames of reference, and to detect dynamic objects. Preliminary evaluations hint at the feasibility of AVR and also highlight research challenges in achieving AVR’s potential to improve autonomous vehicles and ADAS.
to be updated
The popularity of mobile apps continues to grow as developers take advantage of the sensors and data available on mobile devices. However, the increased functionality comes with a higher energy cost, which can cause a problem for users on battery constrained mobile devices. To improve the energy consumption of mobile apps, developers need detailed information about the energy consumption of their applications. Existing techniques have drawbacks that limit their usefulness or provide information at too high of a level of granularity, such as components or methods. Our approach is able to calculate source line level energy consumption information. It does this by combining hardware-based power measurements with program analysis and statistical modeling. Our empirical evaluation of the approach shows that it is fast and accurate.
Mobile data usage is on a tremendous rise, due not only to increasing number of users but also to an increase in the number of applications that transfer data over the network. Moreover, applications for sharing, sensing, and collaboration have become more popular, causing significant amounts of data to be generated on devices. Managing this data –syncing it to the cloud, or with other users or devices– is a crucial and often challenging part of writing mobile apps and services. In spite of plenty of good advice and best practices from OS vendors and network operators, storing and transferring mobile data is fraught with issues. On the one hand, an app developer needs to worry about the semantics of data storage and synchronization, while on the other, about the end-user experience, which maybe impacted by poor and intermittent network connectivity. To address the needs of the app developers and the end-users, we have built Izzy: a platform to rapidly develop and deploy data-centric mobile apps. Izzy provides well-defined and easy to use semantics for accessing local storage and for synchronizing data with a remote, scalable, global store. Izzy also provides global store access to the cloud-resident part of the applications (if any) through a similar server API. Last but not least, Izzy is designed to be frugal: it conserves mobile device resources by applying delay-tolerance and data reduction techniques (message coalescing and compression) across applications on a mobile device. In this paper we present the design of Izzy and our early experiences with using it.
Mobile app ecosystems have experienced tremendous growth in the last five years. As researchers and developers turn their attention to understanding the ecosystem and its different apps, instrumentation of mobile apps is a much needed emerging capability. In this paper, we explore a selective instrumentation capability that allows users to express instrumentation specifications at a high level of abstraction; these specifications are then used to automatically insert instrumentation into binaries. The challenge in our work is to develop expressive abstractions for instrumentation that can also be implemented efficiently. Designed using requirements derived from recent research that has used instrumented apps, our selective instrumentation framework, SIF, contains abstractions that allow users to compactly express precisely which parts of the app need to be instrumented. It also contains a novel path inspection capability, and provides users feedback on the approximate overhead of the instrumentation specification. Using experiments on our SIF implementation for Android, we show that SIF can be used to compactly (in 20-30 lines of code in most cases) specify instrumentation tasks previously reported in the literature. SIF’s overhead is under 2% in most cases, and its instrumentation overhead feedback is within 15% in many cases. As such, we expect that SIF can accelerate studies of the mobile app ecosystem.
Optimizing the energy efficiency of mobile applications can greatly increase user satisfaction. However, developers lack viable techniques for estimating the energy consumption of their applications. This project proposes a new approach that is both lightweight in terms of its developer requirements and provides fine-grained estimates of energy consumption at the code level. It achieves this using a novel combination of program analysis and per-instruction energy modeling. In evaluation, our approach is able to estimate energy consumption to within 10% of the ground truth for a set of mobile applications from the Google Play store. Additionally, it provides useful and meaningful feedback to developers that helps them to understand application energy consumption behavior.
to be updated
to be updated
to be updated
to be updated
to be updated
The availability of multiple sensors on mobile devices offers a significant new capability to enable rich user and context aware applications. Many of these applications run in the background to continuously sense user context. However, running these applications on mobile devices can impose a significant stress on the battery life, and the use of supplementary low-power processors has been proposed on mobile devices for continuous background activities. In this paper, we experimentally and analytically investigate the design considerations that arise in the efficient use of the low power processor and provide a thorough understanding of the problem space. We answer fundamental questions such as which segments of the application are most efficient to be hosted on the low power processor, and how to select an appropriate low power processor. We discuss our measurements, analysis, and results using multiple low power processors and existing phone platforms
Cloud operators increasingly need many fine-grained rules to better control individual network flows for various management tasks. While previous approaches have advocated placing rules either on hypervisors or switches, we argue that future data centers would benefit from leveraging rule processing capabilities at both for better scalability and performance. In this paper, we propose vCRIB, a virtualized Cloud Rule Information Base that allows operators to freely define different management policies without the need to consider underlying resource constraints. The challenge in our approach is the design of a vCRIB manager that automatically partitions and places rules at both hypervisors and switches to achieve a good trade-off between resource usage and performance.
Optimizing the energy efficiency of mobile applications can greatly increase user satisfaction. However, developers lack easily applied tools for estimating the energy consumption of their applications. This paper proposes a new approach, eCalc, that is lightweight in terms of its developer requirements and provides code-level estimates of energy consumption. The approach achieves this using estimation techniques based on program analysis of the mobile application. In evaluation, eCalc is able to estimate energy consumption within 9.5% of the ground truth for a set of mobile applications. Additionally, eCalc provides useful and meaningful feedback to the developer that helps to characterize energy consumption of the application.
The ubiquity of smartphones and their on-board sensing capabilities motivates crowd-sensing, a capability which harnesses the power of crowds to collect sensor data from a large number of mobile phone users. Unlike previous work on wireless sensing, crowd-sensing poses several novel requirements: support for humans-in-the-loop to trigger sensing actions or review results, the need for incentives, as well as privacy and security. In this paper, we design and implement Medusa, a novel programming framework for crowd sensing that satisfies these requirements. Medusa provides high-level abstractions for specifying the steps required to complete a crowd-sensing task, and employs a distributed runtime system that coordinates the execution of these tasks between smartphones and a cluster on the cloud. We have implemented ten crowd-sensing tasks on a prototype of Medusa. We find that Medusa task descriptions are two orders of magnitude smaller than standalone systems required to implement those crowd-sensing tasks, and the runtime has low overhead and is robust to dynamics and resource attacks.