### Abstract

A vector A of length N is defined implicitly, via a stream of updates of the form "add 5 to A_{3}." We give a sketching algorithm, that constructs a small sketch from the stream of updates, and a reconstruction algorithm, that produces a B-bucket piecewise-constant representation (histogram) H for A from the sketch, such that ∥A - H∥ ≤ (1 + ε) ∥A - H_{opt}∥, where the error ∥A - H∥ is either ℓ^{1} (absolute) or ℓ^{2} (root-mean-square) error. The time to process a single update, time to reconstruct the histogram, and size of the sketch are each bounded by poly(B, log(N), log ∥A∥, 1/ε). Our result is obtained in two steps. First we obtain what we call a robust histogram approximation for A, a histogram such that adding a small number of buckets does not help improve the representation quality significantly. From the robust histogram, we cull a histogram of desired accuracy and B buckets in the second step. This technique also provides similar results for Haar wavelet representations, under ℓ_{2} error. Our results have applications in summarizing data distributions fast and succinctly even in distributed settings.

Original language | English (US) |
---|---|

Pages (from-to) | 389-398 |

Number of pages | 10 |

Journal | Conference Proceedings of the Annual ACM Symposium on Theory of Computing |

State | Published - Jan 1 2002 |

Event | Proceedings of the 34th Annual ACM Symposium on Theory of Computing - Montreal, Que., Canada Duration: May 19 2002 → May 21 2002 |

### Fingerprint

### ASJC Scopus subject areas

- Software

### Cite this

*Conference Proceedings of the Annual ACM Symposium on Theory of Computing*, 389-398.