Research

Research Experience:

On the Separability of Classes with the Cross-Entropy Loss Function (Jan ‘19 - May ‘19)
Guide : Prof. Subhasis Chaudhuri, EE Department, IIT Bombay
- Analysis of the separability of classes attained with the cross-entropy loss function for classification problems by theoretically analyzing the intra-class distance and inter-class distance (i.e. the distance between any two points belonging to the same class and different classes, respectively) in the feature space, i.e. the space of representations learned by neural networks.
- Specifically, considered an arbitrary network architecture having a fully connected final layer with Softmax activation and trained using the cross-entropy loss. Derived expressions for the value and the distribution of the squared L2 norm of the product of a network dependent matrix and a random intra-class and inter-class distance vector (i.e. the vector between any two points belonging to the same class and different classes), respectively, in the learned feature space (or the transformation of the original data) just before Softmax activation, as a function of the cross-entropy loss value.
- Main result of the analysis is the derivation of a lower bound for the probability with which the inter-class distance is more than the intra-class distance in this feature space, as a function of the loss value. As per intuition, the probability with which the inter-class distance is more than the intra-class distance decreases as the loss value increases, i.e. the classes are better separated when the loss value is low.
- First theoretical work trying to explain the separability of classes in the feature space learned by neural networks trained with the cross-entropy loss function. Paper submitted to NeurIPS 2019.
Extremal Eigenvalue Analysis of the Hessian and a Learning Rate Choice for Stochastic Gradient Descent (Jan ‘19 - June ‘19)
Guide : Prof. Subhasis Chaudhuri, EE Department, IIT Bombay
- Analysis of the extremal (maximum and minimum) eigenvalues of the Hessian matrix is of significance, especially for setting the learning rate/step-size in gradient based optimization algorithms and to gain insight about the performance of saddle point escaping algorithms.
- In relation to this, presented a technique to obtain probabilistic bounds for the extremal eigenvalues of the Hessian at a point, as a function of the loss value at that point. The technique is based on matrix concentration inequalities and the computations are performed for a single hidden layer neural network with the cross-entropy loss function. In theory, the technique can be extended to more complicated architectures and loss functions, provided the Hessian obeys a certain structure.
- Secondly, proposed a method to automatically set the learning rate in mini-batch stochastic gradient descent, a core part of which, also relies on matrix concentration inequalities. Two schemes have been proposed - one in which the batch size is kept fixed and another one in which the batch size is increased with the number of epochs allowing us to choose much higher learning rates than those in the first scheme, potentially leading to faster training.
- Paper submitted to SIAM Journal on Mathematics of Data Science (SIMODS).
A Randomized Algorithm to Detect and Escape Saddle Points (Aug ‘18 - Mar ‘19)
Guide : Prof. Subhasis Chaudhuri, EE Department, IIT Bombay
- Proposed a randomized iterative algorithm to detect whether a critical point (i.e. a point where the derivatives of the loss function with respect to the parameters are zero) is a local minima or a saddle point and to escape that point if it is a saddle point, without requiring to compute the Hessian.
- Essentially, the algorithm tries to find the direction corresponding to the minimum eigenvalue of the Hessian (without computing or storing it) by the use of iid random samples which leaves ample scope for parallelization of the computations involved in each iteration.
- Its time complexity is logarithmic wrt the dimension and approximately linear wrt the inverse of the magnitude of the minimum (negative) eigenvalue of the Hessian. An upper bound for the number of random samples is also derived. [Link]
On the Existence of Sparse Bases for Deep Learning Kernels (Aug ‘18 - Sep ‘18)
Guide : Prof. Subhasis Chaudhuri, EE Department, IIT Bombay
- Derived a probabilistic proof to suggest the possibility of the existence of sparse basis using few training points, for the final layer of binary classification networks before sigmoid (i.e. the transformed input which is linearly separable and the kernel being the transformation function) with the cross-entropy loss. The number of training points constituting the aforementioned sparse basis is much lesser than the dimension of the transformed input (or the dimension of the co-domain of the kernel function).
- Hypothesis corroborated by experimental results.
- This has an important implication - even though a large number of examples might be required to train deep learning networks, perhaps the learnt kernel can generalize well using only a few of the training examples. [Link]
Nonlinear Blind Compressed Sensing under Signal Dependent Noise (July ‘18 - Jan ‘19)
Guide : Prof. Ajit Rajwade, CSE Department, IIT Bombay
- Proposed an algorithm for non-linear blind compressed sensing (jointly estimating the sparse basis & sparse codes) which is a non-convex problem under signal dependent Poisson noise.
- Used the Anscombe transform (essentially square root transform which converts Poisson noise to Gaussian noise) to formulate the problem as a nonlinear least squares problem with the L1 penalty imposed for promoting sparsity. Solved this objective function under non-negativity constraints imposed on both the sparse codes and the basis. To this end, proposed a multiplicative update rule, similar to that used in NMF.
- First attempt at a formulation for nonlinear blind compressed sensing, with and without the Poisson noise model. Paper accepted for presentation in IEEE International Conference on Image Processing (ICIP) 2019.
Sparse Kernel PCA (SKPCA) for Outlier Detection (Nov ‘17 - May ‘18)
Guide : Prof. Suyash Awate, CSE Department, IIT Bombay
- Proposed a novel SKPCA algorithm by formulating it as a constrained optimization problem with elastic net regularization in the kernel space, solving it using alternating minimization. Tested it on 5 real world datasets and showed that it outperforms the recent SKPCA method proposed in “Sparse kernel principal component analysis based on elastic net regularization.” by Wang et al. (2016) which uses ADMM, with lesser parameter tuning.
- Also presented a new probabilistic proof to justify the existence of sparse solutions in KPCA using the RBF kernel, which is the first attempt in this direction.
- Paper accepted for oral presentation in IEEE International Conference on Machine Learning and Applications (ICMLA) 2018. [Code]
Multiple Instance Learning (MIL) in Breast Cancer Histology Images (Feb ‘18 - Present)
Guide : Prof. Amit Sethi, EE Department, IIT Bombay
- MIL is an unsupervised learning problem where the label of the entire image (“bag”) is given and the labels of the patches (“instances”) in the image are to be determined from this.
- Worked on self-supervised learning using the proxy tasks of colorization with different loss functions, to learn good embeddings which can be used for deep attention based MIL. Additionally, preliminary experiments on 3 medical datasets indicate that self-supervision using the proxy task of colorization with the MS-SSIM loss provides a good initialization for segmentation leading to faster training as well as lesser overfitting. [Report]
- Tried Bayesian Learning for MIL using features extracted from auto-encoders and obtained results comparable to state of the art for the Bisque data set. However, this method did not generalize well! [Report]
Sentence Compression Using Deep Learning (Mar ‘18 - May ‘18)
Guide : Prof. Sunita Sarawagi, CSE Department, IIT Bombay
- Designed a 3-layer bi-directional LSTM model for sentence compression by modelling it as a binary classification problem (which words to retain/delete). Compared it with the method proposed in “Sentence Compression by Deletion with LSTMs” by Google NLP Research and got marginally better results. [Code] [Report]
Speeding up Kernel PCA (KPCA) (July ‘17 - Oct ‘17)
Guide : Prof. Suyash Awate, CSE Department, IIT Bombay
- Used the improved Nyström method to obtain a low rank approximation to the Gram matrix. Using this, developed a fast algorithm for eigenvector computation in KPCA, improving time complexity from $O(n^{2}p)$ to $O(np^{2})$, where $n$ is the number of data points and $p \ll n$ is the rank of the approximated Gram matrix.
- Simulated the above algorithm and obtained almost a linear speed up over MATLAB’s “eigs” function with negligible error in the obtained eigenvectors and eigenvalues. [Code] [Report]

Research Internship :

Institute for Biomechanics, ETH Zürich, Switzerland (May ‘17 - July ‘17)
Guide : Dr. Patrik Christen and Prof. Dr. Ralph Müller, D-HEST
- Constructed a linear model for bone re-modelling with some dependence on initial conditions, obtained a closed form solution for it and analyzed its stability using eigenvalue analysis, which was not done earlier.
- Also built a directed graphical model to capture the random nature of the process and simulated it.
- Developed an automated 2D-3D image registration framework for histology images from scratch, which included devising an efficient sampling strategy to obtain the 2D image across an arbitrary plane of the given 3D image, formulating a good cost function (for measuring similarity) in order to mitigate the effect of the existence of several local minima, choosing a suitable optimization algorithm (tried Levenberg–Marquardt, Powell’s method, PSO, Genetic algorithms) and finally coding it all up.

I was also one of the very few undergraduates selected to attend the PRAIRIE Artificial Intelligence Summer School (PAISS) co-organized by Inria and NAVER LABS Europe in Grenoble, France during July 2018. Here I presented a poster (can be found here) titled “Existence of Sparse Basis for Deep Learning Kernels?”.

Key Academic Projects :

Using the Kernel Trick in Compressed Sensing (April ‘18 - May ‘18)
Guide : Prof. Animesh Kumar, EE Department, IIT Bombay
- Implemented the paper “Using the kernel trick in compressive sensing: Accurate signal recovery from fewer measurements.” which performs compressed sensing in higher dimensional feature space by utilizing the kernel trick. The proposed method in the paper projects the data along random Gaussian directions and a probabilistic error bound is provided.
- Extended the method to the case of directions sampled from a Bernoulli distribution, thus making it more hardware realizable. Also provided a theoretical proof for this extension. [Report] [Presentation]
Extractive Text Summarization using Neural Networks (Sep ‘17 - Nov ‘17)
Guide : Prof. Ganesh Ramakrishnan, CSE Department, IIT Bombay
- Implemented the paper “A Simple but Tough-to-Beat Baseline for Sentence Embeddings” and used the embeddings to select key sentences (modelled it as binary classification problem) in a document (extractive summarization) by ensembling neural networks.
- Also designed a CNN architecture based on the EMNLP paper “Convolutional Neural Networks for Sentence Classification” which further improved results. [Code] [Report]
Image segmentation using Grab Cut Algorithm (Feb ‘17 - April ‘17)
Guide : Prof. Suyash Awate, CSE Department, IIT Bombay
- Implemented Grab Cut which employs Gaussian Mixture Models (GMMs) along with the Graph Cut algorithm, for interactive extraction of foreground in a complex environment with reduced user interactions.
- Simulated the algorithm on medical and natural images, obtaining good results. [Code] [Report]
Real Time Tracking of Non-Rigid Objects (Feb ‘17 - April ‘17)
Guide : Prof. Ajit Rajwade, CSE Department, IIT Bombay
- Built a real time object tracking model for videos using mean shift algorithm with Bhattacharya coefficient to determine the object trajectory. It was robust to partial occlusion, clutter, rotation & camera position.
- The model was successfully able to track humans, objects, vehicles etc. in real world videos. [Code] [Report]
Visible Light Communication(Li-Fi) (Jan ‘17 - April ‘17)
Guide : Prof. Kumar Appaiah, EE Department, IIT Bombay
- Built an optical channel to transfer a Manchester encoded data stream synchronously.
- Used Tiva-C micro-controller to transmit encoded data, which was received by a Clock Recovery Circuit; successfully decoded and displayed on an LCD at the receiving micro-controller.
- Synchronously transferred encoded data at speed of 100 kbps over a distance of 3 meters. Also built an asynchronous system with a data rate of 30 kbps over 0.5 meters distance.
Flow Based Image Extraction (Sep ‘16 - Nov ‘16)
Guide : Prof. Suyash Awate & Ajit Rajwade, CSE Department, IIT Bombay
- Implemented a non-photorealistic rendering method to give stylized effect to images.
- Applied a flow based difference of Gaussian filter for line extraction and then a flow based bilateral filter for region smoothing to produce a stylized version of natural images. [Code]
Min-cut based approach to find pathways in biological regulatory networks (Dec ‘15 - Jan’16)
Guide : Prof. Supratik Chakraborty, CSE Department, IIT Bombay
- Worked on implementing an efficient semi-automated approach for finding pathways in systems biological regulatory networks using min-cuts.
- Implemented the Gusfield algorithm in C++ to construct the Gomory Hu tree of the equivalent undirected graph which was used to obtain the min-cut edges between all pairs of nodes of the graph in $O(n)$ time, instead of the naive algorithm which takes $O(n^{2})$ time, thereby providing a linear speed up.
- Also optimized the code in terms of memory by utilizing the sparsity of the adjacency matrix.

Rudrajit Das

Research