The high energy consumption of visual sensing continues to impede the future of mobile vision in which devices will continuously compute visual information from sensory data, e.g,. for visual personal assistants or for augmented reality (AR). While vision algorithms continue to improve in task accuracy and speed, mobile and wearable vision systems fail to achieve sufficient battery life when vision tasks are continuously running. Continuous video capture drains the battery of Google Glass in 30 minutes . It is well known that a common culprit is the energy-expensive traffic of image data [8, 9]. Transferring high resolutions at high frame rates draws substantial power consumption from the analog-digital conversion, the sensor interface transactions, and the memory usage. Simply capturing 1080p frames at 30 frames per second consumes more than 2.4 W of system power measured on a MOTO Z smartphone. However, capturing and displaying 480p frames only consumes 1.3 W of system power. Image resolution can create an interesting tradeoff for visual tasks: low resolution promotes low energy consumption, while high resolution promotes high imaging fidelity for high visual task accuracy. For example, as we explore with our AR marker-based pose estimation case study, lower resolutions suffice when an AR marker is close, but high resolutions are needed when the AR marker is far away or small. This tradeoff has been explored by several visual computing system works including marker pose estimation, object detection, and face recognition [3–6, 9, 10, 13, 14, 17]. We too advocate that mobile vision systems should be able to benefit from the ability to situationally sacrifice image resolution to save system energy when imaging detail is unnecessary. Unfortunately, any change in sensor resolution leads to a substantial pause in frame delivery. This is illustrated in Figure 1a. We measure that reconfiguring sensor resolution in the Android OS prevents the application from receiving frames for about 267 ms, the equivalent of dropping 9 frames (working at 30 FPS) from vision processing pipelines . Consequently, computer vision applications don’t change resolutions at runtime, despite the significant energy savings at lower resolutions. For example, Augmented Reality applications such as "Augment" and "UnifiedAR" constantly work at 1080p, drawing 2.7 W of system power.