1Research Centre for Cheminformatics, Jasenova 7, 11030 Beograd, Serbia
Stankovic et al.  recently published a paper in The Journal of Digestive Diseases in which they report their research regarding inflamatory bowel disease (IBD) and
variations in NOD2, TLR4, TNF-α, IL-6, IL-1β, IL-1RN genes. Based on their findings they conclude that the genes involved in immune regulation are genetic factors of importance in IBD susceptibility that can be used as predictors of the disease development.
We disagree with the way Stankovic et al. . generated their conclusion. We are also surprised to find that so many crucial information is missing in their paper.
1. Authors applied repeated k-fold cross-validation to calculate area under the curve (AUC) of the Receiver Operating Characteristic Curve (ROC) for four model types. We agree with them that the AUC is a ranking measurement and we believe that they correctly calculated it. We also believe that they appropriately applied the process of selecting an optimal model as described by Krstajic et al. . However, we report some crucial information missing in the paper:
a) Stankovic et al.  only represented information regarding the ranking statistics , i.e. AUC of their predictive models. Even though AUC is widely used and relevent measure, it is not sufficient for many medical applications. At the very least specificity and sensitivity need to be reported..
b) Authors state that the AUC has become standard practice for the genetic prediction models and quote a paper by Janssens et al.  as their reference for using AUC. The full title of the Janssens et al.  paper is “Predictive genetic testing for type 2 diabetes may raise unrealistic expectations”. The paper by Janssens et al.  does not mention AUC at all.
c) Authors report only a single AUC value for each model type. There is no confidence interval nor any interval of AUC values associated with each reported AUC value. By reporting just a single AUC value authors failed to provide any measurement of its variability, i.e. its confidence.
2. Authors performed the selection of genes on the entire trainings set and then applied model building process on the same training set. This is a flawed procedure in predictive model building as Ambroise and McLachlan  have pointed out, and it is know as the “selection bias”.
3. As regards the predictive model performances, the conclusions may be derived either from an independent test (not used in model building) , or using sophisticated sampling methods like nested cross-validation . Stankovic et al.  have not performed any model assessment.
4. In the last paragraph Stankovic et al.  state that “Although the primary aim of these models is providing approximations of outcome probabilities, they could also give insight into causality of the disease pathophysiology.” We strongly disagree with this statement. because prediction and explanation are not the same thing. A set of genes may be correlated to an occurrence of a disease, but without any additional strong arguments one cannot state causality here.
As we have pointed out earlier, we believe that the authors reported correct AUC values in the paper. We believe that the authors used predictive modelling with good intention to show additional value to their findings. However, Stankovic et al.  have not provided sufficient evidence to support their key conclusions. Firstly, it is not proven that variations in genes involved in immune regulation are genetic factors of importance in IBD susceptibility. Secondly, it is not proven that those genes can be used as predictors for the disease development.
6. Janssens AC, Gwinn M, Valdez R, Narayan KM, Khoury MJ. Predictive genetic testing for type 2 diabetes may raise unrealistic expectations. BMJ 2006; 333: 509-10.
7. Ambroise, Christophe, and Geoffrey J. McLachlan. "Selection bias in gene extraction on the basis of microarray gene-expression data." Proceedings of the national academy of sciences 99.10 (2002): 6562-6566.