The weekly SILO Seminar Series is made possible through the generous support of the 3M Company and its Advanced Technology Group


with additional support from the Analytics Group of the Northwestern Mutual Life Insurance Company

Northwestern Mutual

Heavy hitters via cluster-preserving clustering

Jelani Nelson, Assistant Professor - Harvard

Date and Time: Apr 19, 2017 (12:30 PM)
Location: Orchard room (3280) at the Wisconsin Institute for Discovery Building



In the "heavy hitters" or "frequent items" problem, one must process a stream of items and report those items that occur frequently. For example, a telecommunications company may wish to find popular destination IP addresses in a packet stream across one of their links, or a search engine may wish to report popular query words. A more general problem is when there are two streams and we must report those items whose frequencies significantly deviate between them; the former problem is a special case since we can artificially pretend the first of the two streams the empty stream. Such a problem naturally arises in trend detection and anomaly detection. Several algorithms were known in the literature solving these problems, such as the CountMin sketch, CountSketch, Hierarchical CountSketch and others.