Chennai Mathematical Institute

Seminars




12:00, Seminar Hall
Efficient Dimensionality Reduction for Sparse Binary Data

Rameshwar Pratap
Wipro AI-Research.
08-02-19


Abstract

In this work, we propose a dimension reduction algorithm for high dimensional, sparse, binary data. Our proposed algorithm provides a single solution by simultaneously preserving multiple similarity measures including Hamming distance, Inner product, Cosine and Jaccard Similarity, in the same sketch. In contrast to the “local projection” strategy used by most of the earlier algorithms, our approach exploits sparsity and combines the following two strategies: 1. partitioning the dimensions into several buckets, 2. obtaining “global linear summaries” within those buckets. Our algorithm is faster than the existing state-of-the-art, and it preserves the binary format of the data after the dimension reduction, which makes the sketch space efficient. Our algorithm can also be easily adapted in streaming and incremental learning frameworks. Our proposed algorithm is very simple and easy to implement in practice.