Data is often only weakly annotated: for example, for a medical image, we might know the patient’s overall diagnosis, but not where the abnormalities are located, because obtaining ground-truth annotations is very time-consuming.
Multiple instance learning (MIL) is an extension of supervised machine learning, aimed at dealing with such weakly labeled data. For example, a classifier trained on healthy and abnormal images, would be able to label a both a previously unseen image AND local patches in that image. Next to computer-aided diagnosis, MIL has been applied to image recognition, drug activity prediction, text categorization and other problems.
There are still a number of open research directions. For example, how can we evaluate the patch-level predictions without ground-truth labels? Could we improve MIL algorithms by asking experts only a few questions, where they verify the algorithm’s decisions? What can we learn about MIL in medical imaging from other applications where it has been applied?
As a MSc student you would choose one or more medical imaging applications you are interested in, from the datasets already available in the lab or by using open datasets from Kaggle or grand-challenge.org, and work with me on formulating your own research question. Participating in a machine learning competition, creating open source tools and/or writing a paper for a scientific conference are also encouraged.