We have participated in contract research projects related to regression, classificiation and survival modelling for over a decade. Our constant research questions are: How can we systematically improve the performance of predictive models we build? How best to automate it?
Our starting premise is that all models and expert systems which predict something ought to generate "don't know" predictions. We think that it is more fruitful (and honest) to define model's non-applicability domain, rather than its applicability domain. Our research questions are: How best to define "don't know" predictions in practice? Why is it important?
Since 2010 we have worked on contract research projects where, if possible, the goal was to find a sub-population which would have a better/worse survival than the remaining part of the population. Our research questions are: How to find such sub-population if it exists? What are the best statistics for measuring that?
While working on numerous right-censored (survival) datasets, we have noticed how "fragile" they are, and how much our estimates depend a lot on certain data points, as well as on certain assumptions. Our research questions are: How best to create predictive models based on survival datasets? How best to report such models? What are the minimum requirements for a survival prediction model to be used in clinical practice?