Theoretical Research in Deep Learning

6 minute read

Published: September 12, 2018

This post contains some guidelines (gathered from self experience and also from some highly experienced people) for doing theoretical research in deep learning (and machine learning in general), strictly for newbies!

I am myself relatively new to this area, having an experience of about a year. In my Dual Degree Project (essentially my Master’s Thesis), I am working on some fundamental theoretical aspects of deep learning. Given that all my advisor’s PhD students are working on application based topics, my advsior was very keen on having new students work on some mathematical topics related to deep learning. And I couldn’t be any happier! Apart from this, in my fourth year, I took an R&D course in which I worked on some theoretical stuff on kernels in machine learning.

Before anyone ventures into the theoretical field of deep learning, a word of caution - it is NOT at all easy to make significant contributions in this field! One must have a really solid mathematical background as well as the patience to go through several highly non-trivial (and often extremely arduous) papers. It can get extremely frustrating at times due to several reasons such as not being able to come up with good ideas, not being able to develop a potential idea properly, wasting hours/days on something which lead to nothing etc. But if you are up for the challenge, it should be really enjoyable.

First of all, you have to refer to several theoretical papers from top AI conferences like ICML, NIPS, ICLR etc. since every year, there are tons of excellent and highly original theoretical papers published in these conferences. The papers in NIPS and ICML are not restricted to just deep learning and encompass several other domains, whereas ICLR is a bit more deep learning centred. Nevertheless, the quality of the theoretical papers is very high in all these conferences. The papers published in these top conferences give you a good idea of what are the theoretical problems of current relevance or importance in the AI community. In my case, I started off by skimming through the list of papers published in ICML 2017 and ICLR 2018. I glanced through the list of accepted papers and only looked at the abstracts of the ones whose title seemed interesting enough to me. This saves you a lot of time as the list is just huge. I mainly looked at papers on the expressive power of neural networks (ICLR 2018 had loads of them!) and the ones on optimization in general. So you could also choose some specific topics which you prefer.

Once you have referred to several (the quantification of ‘several’ is subjective) papers, you have to find gaps in the existing work or a significant extension/improvement of some one else’s work that (“to the best of your knowledge”) has not been attempted so far. A potential problem here can be struggling to come up with new ideas or developing a potential idea concretely. It happened to me as well. Talk to your advisor or any acquaintance who is actively involved in theoretical work about this. Often they can suggest you good ideas (due to their experience) or some other direction all together, which they are very optimistic about. I think experiments are really good stimuli to ideas, especially in deep learning. I have seen quite a few papers which perform interesting experiments to raise some specific issue, and then maybe it is resolved theoretically in that paper itself or in a subsequent paper. So for instance, you could possibly pick some specific algorithm and try to figure out cases where it fails, why it could be a potential problem in other cases too and possibly suggest some remedy to fix it. Additionally, some empirical modelling/observations/facts (please proceed with caution here!) could be used in conjunction with elaborate theory, especially for really complicated analysis which often comes up in deep learning.

Now that you have some really cool ideas, it is imperative that you do a thorough literature survey specific to your topic, in order to ascertain whether your idea has already not been published by other smart people. I say this from personal experience. In my fourth year, I spent two weeks conceiving a novel algorithm and its proof based on some ideas I had read up somewhere else, only to find out a week later (and that too after emailing it to my guide) that it was already published somewhere else! It was very disheartening and also a massive waste of time. Also it may happen that some guys have come up with something better than your idea, but you are not aware of it due to insufficient literature survey or in my case, because it just got published days before my submission deadline! So always be on the lookout for similar papers (i.e papers related to your topic), especially if you are working on something very recent.

Once you are sure that your super cool idea is completely novel, start writing it down properly. For instance, I just get way too carried away with ideas in my mind. It is only when I start writing them down properly, problems begin to show up! Also hand-wavy arguments are an absolute no-no in theoretical papers. I recommend writing it down in the form of a paper only (here I’m assuming that your ultimate goal is to get a publication), because doing so forces you to write each and every step properly and clearly, which will help you identify/spot the sketchy areas of your proofs etc. After writing it down properly, have your advisor and any math inclined person read all of it carefully. You do not want to put something incorrect in your paper! But more than the math itself, your paper should be lucid enough for other people to understand. Please realize that your idea comes off as completely new to them, and if it’s not presented well enough, you might not get favourable reviews/responses even though your idea is brilliant. So presentation plays a key role in theoretical papers. My advisor told me all of this, since the initial draft of my paper made very little sense to him. Needless to say, paper writing is a highly iterative process. You have to constantly make changes as you get it reviewed by others. Usually the final draft of your paper will significantly differ from your initial draft.

I hope these guidelines will serve useful to some of you. Best of luck!!

Share on

Twitter Facebook Google+ LinkedIn

Rudrajit Das

Theoretical Research in Deep Learning

Share on

You May Also Enjoy

Good Initialization for Alternating Minimization

Extreme Value Theory (EVT) for Limiting Distributions of Extreme Events

Recent Advances in Non-Convex Optimization for Deep Learning