We present the implementation of a large-scale latency estimation system based on GNP and incorporated into the Google content delivery network. Our implementation employs standard features of contemporary Web clients, and carefully controls the overhead incurred by latency measurements using a scalable centralized scheduler. It also requires only a small number of CDN modifications, which makes it attractive for any CDN interested in large-scale latency estimation. We investigate the issue of coordinate stability over time and show that coordinates drift away from their initial values with time, so that 25% of node coordinates become inaccurate by more than 33 ms after one week. However, daily re-computations make 75% of the coordinates stay within 6 ms of their initial values. Furthermore, we demonstrate that using coordinates to decide on client-to-replica re-direction leads to selecting replicas closest in term of measured latency in 86% of all cases. In another 10% of all cases, clients are re-directed to replicas offering latencies that are at most two times longer than optimal. Finally, collecting a huge volume of latency data and using clustering techniques enable us to estimate latencies between globally distributed Internet hosts that have not participated in our measurements at all. The results are sufficiently promising that Google may offer a public interface to the latency estimates in the future.
- Large-scale distributed systems
- Latency estimation
- Network modeling