Seminar Announcement Speaker: Sourav Chakraborty, ISI Kolkata Date: Monday, 22 January 2024 Time: 2:00 PM Venue: Lecture Hall 2 Distinct Elements in Streams and the Klee's Measure Problem Sourav Chakraborty ISI Kolkata. 220124 Abstract We will present a very simple streaming algorithm on F_0 estimation that also caught the eye of Donald E. Knuth. In a recent article, Donald E. Knuth started with the following two paragraphs: "Sourav Chakraborty, N. V. Vinodchandran, and Kuldeep S. Meel have recently proposed an interesting algorithm for the following problem: A stream of elements (a1, a2,...,am) is input, one at a time, and we want to know how many of them are distinct. In other words, if A = {a1, a2,...,am} is the set of elements in the stream, with multiplicities ignored, we want to know A, the size of that set. But we don’t have much memory; in fact, A is probably a lot larger than the number of elements that we can hold in memory at any one time. What is a good strategy for computing an unbiased estimate of A? Their algorithm is not only interesting, it is extremely simple. Furthermore, it’s wonderfully suited to teaching students who are learning the basics of computer science. (Indeed, ever since I saw it, a few days ago, I’ve been unable to resist trying to explain the ideas to just about everybody I meet.) Therefore I’m pretty sure that something like this will eventually become a standard textbook topic. This note is an initial approximation to what I might write about it if I were preparing a textbook about data streams." This simple algorithm comes out of the first ever "efficient" streaming algorithm (from PODS 21) for the Klee's Measure problem, which was a big open problem in the world of streaming for many years. This work is based on joint works with N. V. Vinodchandran, and Kuldeep S. Meel across multiple articles, notable the following: * Estimating the Size of Union of Sets in Streaming Models. PODS 2021 * Estimation of the Size of Union of Delphic Sets: Achieving Independence from Stream Size. PODS 2022 * Distinct Elements in Streams: An Algorithm for the (Text) Book. ESA 2022 14
