 Research
 Open Access
 Published:
Iterative sure independent ranking and screening for drug response prediction
BMC Medical Informatics and Decision Making volume 20, Article number: 224 (2020)
Abstract
Background
Prediction of drug response based on multiomics data is a crucial task in the research of personalized cancer therapy.
Results
We proposed an iterative sure independent ranking and screening (ISIRS) scheme to select drug responseassociated features and applied it to the Cancer Cell Line Encyclopedia (CCLE) dataset. For each drug in CCLE, we incorporated multiomics data including copy number alterations, mutation and gene expression and selected up to 50 features using ISIRS. Then a linear regression model based on the selected features was exploited to predict the drug response. Cross validation test shows that our prediction accuracies are higher than existing methods for most drugs.
Conclusions
Our study indicates that the features selected by the marginal utility measure, which measures the conditional probability of drug responses given the feature, are helpful for drug response prediction.
Background
A major goal in cancer research is to select an efficacious drug or drug combinations for each individual patient based on their genomic and transcriptomic profiles [1]. To get a much more comprehensive understanding of the potential genetic makeup of a patient, researchers have tried multiomics data including protein concentration, gene expression and genetic mutations. However, the methodology of translating the genetic measurements to predictive models for assisting therapeutic decisions is still a challenge.
Researchers have tried many methods to find biomarkers and predict drug sensitivity. These methods are mainly based on gene expression measurements. Staunton et al. proposed a weighted voting classification strategy to classify each cell line as sensitive or resistant for each drug based on the NCI60 gene expression data [2]. Riddick et al. developed a novel multistep regression model for drug response using Random Forest [3]. However, the biomarker of a certain drug for different cancer types may be different because of the heterogeneity of different cancers, so it is more realistic to focus on some specific type of cancers. Lee et al. developed a genetic algorithm termed as “coexpression extrapolation”, which can accurately predict drug sensitivity of bladder cancer cell lines and clinical responses of breast cancer patients treated by commonly used chemotherapeutic drugs [4]. Holleman et al. used 14,500 probe sets to identify differentially expressed genes in drugsensitive and drugresistant acute lymphoblastic leukemia (ALL) [5]. Besides gene expression, some researchers paid attention to the related factors including epigenetic modifications, chemical description of the drugs and so on. Shen et al. used bisulfite PCR to assess DNA methylation and employed the methylation markers to predict drug sensitivity [6]. Chen et al. proposed a graph matching with multiple network constraints model to accurately identify genedrug modules [7]. Wang et al. used elastic net regression to report the relationship between the lncRNA pharmacogenomic landscape by integrating multidimensional genomic data and drug response data [8]. Additionally, Menden et al. [9] developed a machine learning model to predict the response of cancer cell lines to drug treatment based on both the genomic features of cell lines and chemical properties of the considered drugs. In spite of the success in finding some drug biomarkers, these kinds of approaches still suffer from the typical problem of “highdimension but low sample size” problem in statistical learning, i.e., compared with the large number of expression genes and chemical compounds (p), the number of samples (n) is very limited.
In a recent study, researchers from the Broad Institute of Harvard and MIT generated a large scale genomic data for cancer cell lines (termed as the Cancer Cell Line Encyclopedia, CCLE). Coupling with pharmacological profiles for 24 anticancer drugs across 479 cell lines, this dataset could allow identification of genetic, lineage, and geneexpressionbased predictors of drug sensitivity [10]. In this paper, the authors first screened all genomic features by their marginal correlation with drug response, and then predicted drug sensitivity by elastic net regression. However, the interaction between genomic features also influences drug sensitivities, so the importance of features may change after adding other features into the model. In order to incorporate the important features with weak marginal correlation and remove the marginally strong but jointly unimportant features, Fang et al. applied an iterative sure independence screening (ISIS) to CCLE dataset and improved the accuracy for drug sensitivity prediction [11]. However, the feature screening based on Pearson correlation coefficient (PCC) is sensitive to outliers and needs the assumption of elliptically symmetric distribution in theory [12]. Considering the existence of outliers and asymmetric distribution for most drug sensitivity data in CCLE, we exploited the sure independent ranking and screening (SIRS) [13] that measures the conditional distribution of drug response given genomic features. Note that Zhu et al. [13] proposed a modelfree feature screening approach SIRS to select important features. The SIRS method used the residual of remaining features to do the iterations. To predict the drug response through using the identified important features, we exploit the linearity assumption in modeling drug response and use the residual of response instead of the residual of remaining features to do the iterations.
In this paper, we propose the iterative SIRS (ISIRS) to predict the drug response and apply it to the CCLE dataset. By using the iterative procedure of ISIRS, strong features with marginally weak utility measures can have chance to be recruited, and the weak features with marginally strong measures can be removed. The crossvalidation tests showed that the prediction accuracies of our method outperformed ISIS, STF and SIRS for most drugs in CCLE dataset. Compared with ISIS, the feature importance by PCC showed that ISIRS is robust to outliers and releases the assumption of symmetric distribution. Additionally, we also detected some new drug response related genomic features.
Methods
Datasets
The drug response and cancer genomic data used in this present paper are available from the Cancer Cell Line Encyclopedia (CCLE). This dataset contains copy number alteration, gene expression and mutation status for 947 human cancer cell lines, as well as 8point doseresponse curves for 24 chemical compounds across 479 cell lines. We used the area under doseresponse curves (termed as Activity area in [10]) instead of EC50 and IC50 to measure the sensitivity of drug for a given cell line. It has been extensively exploited because of its efficacy and potency of characterize a drug [14].
Screening procedure
For each cell line, expression profile of 20,069 genes, mutation status of 1654 genes and copy number status of 16,045 genes are integrated as the primary feature vector. The dimension of primary feature vector is too high compared with the sample size. In this paper, the sure independent ranking and screening [13] was introduced to marginally select drug responserelated features.
Let Y be the drug response value and x = (X_{1}, …, X_{p})^{T} be the vector consisting of all candidate features. Without loss of generality, we assume that E(X_{k}) = 0 and var(X_{k}) = 1 for k = 1, …, p after scaling. As suggested in reference [13], we denote the conditional distribution function of Y given x by E (y  x) = P (Y < y x) and define Ω(y) = E{xE(y x)}. Let Ω_{k}(y) be the kth element of Ω(y) and ω_{k} is defined as
In the present paper, we take ω_{k} as the marginal utility measure. The predictor X_{k} is called active predictor if E (y  x) functionally depends on X_{k}, and the one which E (y  x) does not relate with is referred as inactive predictor [13]. Due to the consistency in ranking, the marginal utility measure ω_{k} can always rank an active predictor above an inactive predictor. This guarantees that we can select the active predictors and exclude all inactive predictors [13]. Thus we used ω_{k} for feature screening. For real data, the sample counterpart of ω_{k} is obtained as follows.
Let {(x_{i}, Y_{i}), i = 1, …, n} be a random sample of {x, Y}. First, we normalized the sample predictors such that \( {n}^{1}{\sum}_{i=1}^n{X}_{ik}=0 \) and \( {n}^{1}{\sum}_{i=1}^n{X_{ik}}^2=1 \) for k = 1, …, p. Then the sample estimator for ω_{k} is.
\( {\hat{\omega}}_k=\frac{1}{n}\ {\sum}_{j=1}^n{\left\{\frac{1}{n}\kern0.5em {\sum}_{i=1}^n{X}_{ik}\boldsymbol{I}\ \left({\boldsymbol{Y}}_i<{\boldsymbol{Y}}_j\right)\right\}}^2, \)k = 1, …, p,
where X_{ik} is the kth component of x_{i}. According to the descending order of ω_{k}, all the candidate predictors (features) can be ranked and the top ones are recruited.
Feature selection through iterative sure independent ranking and screening
All feature screening methods based on marginal utility measure suffer from an inherent drawback that they may miss the predictors that are marginally insignificant but jointly related with the response. Based on this observation, we further propose the scheme of iterative sure independent ranking and screening (ISIRS) as follows.
First, we rank all features by sorting \( {\hat{\omega}}_k \) as above in descending order and select the top K_{1} features as A_{1}. Then we carry out the lasso regression based on a linear model for variable selection and get a subset M_{1} of A_{1}. That is, we minimize the objective function
where X_{ik} is the kth component of the feature vector x_{i}, Y_{i} is the ith observation of drug response, β_{0} and β_{k} are lasso estimators, n is the sample size and λ is the penalty tuning parameter. Lasso regression gives shrinkage estimates and some β_{k} can be estimated exactly as zero. The features with nonzero coefficients will be retained with the indices set denoted by M_{1}. We use the notation ∣M_{1}∣ to mean the numbers of features in M_{1}. Consequently, we fit the drug response over the features in M_{1} by a linear regression model and obtain the residuals. Then we take the residuals as a new response and employ SIRS to select the indices set A_{2} from the remaining features with the indices {1, 2, …, p}\M_{1}. In the next step, for the features in the union of A_{2} and M_{1}, we use the lasso regression again and get a subset of features, denoted by M_{2}. Assume that we aim to select d features, the process of feature screening and selection is repeated until ∣M_{s} ∣ = d or M_{s} = ∣ M_{s − 1}∣. To make sure this procedure not stop at the first iteration, we set K_{1} = \( \left[\frac{2d}{3}\right] \) as suggested in [9]. Besides, the consistency in both variable selection and parameter estimation cannot be achieved by lasso at the same time. So similar to Fang et al. [11], we fit a linear regression model based on the selected features by ISIRS and predict drug response by the estimates of ordinary least squares (OLS).
Crossvalidation and evaluation
In statistical prediction, the crossvalidation method is often adopted to test the effectiveness of a predictor [15]. In this paper, we performed 10fold crossvalidation to validate our algorithm. In detail, in each fold, all cell lines treated by one drug were divided into ten roughly equal groups, one of which was processed as the test dataset and the rest nine groups were as the training set to train the model. The average performance across all ten folds was chosen as the final predictive value of drug response.
The Pearson correlation coefficient (PCC) between the average of predicted values and the observed response was used to evaluate the predicting performance, which has been widely used in the literatures [16]. Besides the criterion of PCC, we also calculated the mean squared errors (MSE) of the averaged predicted values from the 10fold cross validation to assess the predicting performance of ISIRS.
Ttest for the significance of regression coefficients
As we know, all features explain the response (drug sensitivity) collectively in the multiple regression models and the explanatory effect is not just the simple summation of the marginal explanatory effects. It is possible that some features may have weak marginal importance but jointly related to the response.
In order to examine the importance of these marginally weak features, the ttest approach was applied to test the significance of the corresponding regression coefficient. If the coefficient is significantly different from zero, it means that the feature is important jointly with other features. Therefore, the testing problem can be described as follows:
Combining the regression model, the above testing problem is essentially equivalent to the testing model,
When we reject H_{0}, it means that the model in H_{1} with X_{ij} can explain the response Y_{i} better than the model in H_{0} without X_{ij}, and the existence of the feature X_{ij} is significant. In this paper, the significance level was set to be 0.01.
Results and discussion
Determination of the number of selected features
To determine the number of selected features d, we explored the predictive performance for each drug with different top features selected by ISIRS. Pearson correlation coefficient between predicted and true response values does not show significant improvement when the selected features are more than 50. This could be that the increased number of features also increases the noise. Therefore we consider the selected features less than 50 for each drug. The PCCs based on 10fold cross validation for four example drugs are shown in Fig. 1 and the results of the rest drugs are shown in Fig. S1.
Now we set the evenly spaced grids {2, 4, 6, …, 50} for d and performed 10 iterations of the 10fold crossvalidation using the ISIRS scheme. The grid point corresponding to the largest PCC between the observed drug responses and predicted values via the 10fold cross validation was taken as the optimal choice for d. For each drug, the final selected features are shown in Table S1.
Analysis of selected features in ISIRS model applying in CCLE dataset
As shown in Table S1, many selected features have significant overlap with those by elastic net regression. It is also similar to elastic net regression that most selected features are gene expression data rather than mutation and copy number alteration status, which is expected since expression profile constitutes the majority of original feature source. Most of the selected features are widely accepted indicators for drug response. For example, the selected mutation features for AZD6244 and PD.0325901 include BRAF and NRAS, which are known to be the predictor of sensitivity to MEK inhibitors. Mutation of BRAF is also ranked as the top feature for PLX4720 (BRAF inhibitor). SLFN11 expression correlates with the antiproliferative activity of topoisomerase I (Top1) inhibitors in the NCI60 [17]. As we all know, Irinotecan is DNA Topoisomerase I Inhibitor. These powerful features are also successfully selected as predictors of drug response by elastic net regression ENR [8].
The target of Paclitaxel is betatubulin and the mechanism of action is MicrotubuleStabilizing Agents. The gene ABCB1 was selected as a strong feature by ISIRS. The mechanisms of resistance to this class of compounds include overexpression of the drug efflux pump protein ABCB1, microtubule cytoskeletal changes, and over expression of specific βtubulin isotype and microtubuleassociated proteins. The microtubulestabilizing agents, such as epothilones, have demonstrated similar activity in ABCB1overexpressing cells [18]. But ABCB1 was not selected as strong features by ISIS and ENR. Nilotinib is a selective BCRABL tyrosine kinase inhibitor. BCRABL1 is positive in adult acute lymphoblastic leukemia (ALL). We selected gene IKZF1 as an active feature. It is reported that IKZF1 deletions are likely to be a genomic alteration that significantly affects the prognosis of Phpositive ALL in adults [19]. AZD0530 is a potent Src family kinase (SFK)/Ab1 dualkinase inhibitor. It is reported that RSF1 is an amplified gene in the highly aggressive ovarian serous carcinoma. The increased RSF1 expression and thus excessive RSF activity can induce chromosomal instability likely through DDR [20]. Gene RSF1 was also selected in our method for AZD0530. PHA.665752 is a cMET Inhibitor. For PHA.665752, ISIRS selects mutation of RHOA as features, which could regulate the coendocytosis of cadherin and cMet [21]. We selected mutation feature AURKC as an active feature for Panobinostat, which has been used in combination with other chemotherapy for children with relapsed AML. The AURKs are serine kinases that are involved mainly in checkpoint regulation in the cell cycle. And three mammalian AURKs have been identified: AURKA, AURKB, and AURKC [22]. Some selective inhibitors of AUIKA and AUIKB have been used in AML treatment. All these features have a common characteristic that their ω_{k} rankings are very low, but are significant according to the regression model. So we can conclude that ISIRS could detect some weak features that jointly correlate with drug response.
To verify the relationship between drugs and the selected genes, we conducted functional enrichment analysis of the selected genes using online metascape tool [23] by taking X17.AAG as an example. The results are shown in Fig. 2. For X17.AAG, the selected 43 genes by ISIRS are significantly enriched in six function terms. The most significant GO term is regulation of steroid biosynth, which contained the genes LBH and NADH. As is reported, the steroid sulfatases convert the local inactive estrogens to their active forms, thus support the breast cancer cells growth [24]. LBH is considered an oncogene directly regulated by the Wnt/βcatenin pathway, and the overexpression of LBH leads to a more aggressive basal differentiation of breast cancer [25]. The gene NADH is known to halt the progression of breast cancer cells. This is because that NADH supplies cellular ATP and then cancer cells cannot grow in an ATP rich environment [26]. And the efficacy of the inhibitor X17.AAG was tested in breast cancer cell lines and X17.AAG was shown to inhibit the growth of breast cancer cells in vitro study [27].
Comparison with existing methods
To evaluate the performance of ISIRS model, we made comparisons with iterative sure independence screening (ISIS), sure independence screening (SIRS) and simple top features (STF), by choosing the Pearson correlation coefficients as the comparative measure. The Pearson correlation coefficients between true and predicted drug responses by ISIRS, ISIS, SIRS, and STF are reported in Table S2 and showed by a bar chart in Fig. 3. We could conclude that our prediction was slightly better than STF and almost the same as ISIS and SIRS. For instance, the mean increase in Pearson correlation coefficients of ISIRS was closer to 0.03 compared with STF. When comparing with ISIS and SIRS, the overall increases are about 0.014 and 0.011. Explicitly, the PCC of L.685458 has increased from 0.52 to 0.57, the increase is 0.12 for Nilotinib (from 0.42 to 0.54), the increase is 0.05 for Paclitaxel from 0.55 to 0.6 and the increase is 0.04 for TKI258 from 0.42 to 0.46 when compared with ISIS. The PCC of 17.AAG has increased from 0.40 to 0.48, and the increase is 0.08 for Paclitaxel and 0.06 for ZD.6474 when it is compared with SIRS. In addition, the predicted correlation by ISIRS are higher than those by STF, with the paired Wilcoxtest (pvalue = 7.09e05). Also, ISIRS gives higher predicted correlations than ISIS, with the paired Wilcoxtest (pvalue = 0.02074), and the performance of STF and ISIS is comparable as expected with pvalue = 0.02151 by paired Wilcoxtest. Meanwhile, ISIRS can also give much higher predicted correlations than ENR, with the paired Wilcoxtest (pvalue = 0.0004297). It is concluded that the ISIRS method could identify some marginally weak features, and achieve better results than other methods.
Besides the aforementioned criterion of PCC, we also calculated the mean squared errors (MSE) of the averaged predicted values from the 10fold cross validation to assess the predicting performance of ISIRS, which are shown in Fig. 4 by a bar plot. From the bar plot, we can observe that all the MSE of ISIRS is lower than ISIS with the Wilcoxtest (pvalue = 5.96e08). And we can clearly observe that some of MSE from ISIS are much higher than those from ISIRS. By coincidence, we find that the distributions of the true drug sensitivity of these drugs are all close to normal distribution. We found that the relationship between the proportional reduction in MSE and skewness is negatively correlated (Fig. 5). It is well known that when the data distribution is gaussian distributed, the results will be more reliable. Therefore, when the true drug sensitivity values follow gaussian distribution, the results of ISIRS will be better.
Next, we analyzed the correlations of predicted drug responses between our ISIRS model and ENR, ISIS, STF and SIRS (Fig. 6). Our predictions were in good consistence with those by ENR model, given the overall correlation of 0.90. In particular, if we neglect the only one outlier, Nilotinib, the overall Pearson correlation will increase from 0.90 to 0.95. And our predictions were in great consistence with those by ISIS, STF and SIRS model, given the overall correlation of 0.97. From Fig. 6, we can see that ENR model brought a higher prediction correlation than ISRIS for drug Nilotinib. Because Nilotinib is a special compound for treating chronic myelogenous leukemia (CML) [28], which was successfully selected as the strongest feature for sensitivity of Nilotinib [8]. This top feature in ENR model dominated the model building and prediction, and brought a high prediction correlation. Except this unique outlier, ISIRS obtained higher predictive correlations by using fewer features than ENR.
We claimed that our ISIRS approach could efficiently eliminate redundancy among selected features. The mean redundancy score (MRS) [29], measured by the PCC and the mutual information (MI) [30], were used to evaluate the redundancy between identified features. In addition, we also implemented the simple top features (STF) method by ranking the features through the ω_{k}, where the features in STF is the same as that in ISIRS. The MRSs for the 24 drugs through ISIRS and STF are listed in Table S3. If measured by PCC, the MRSs by ISIRS and STF are 0.2138 and 0.4420 respectively. It is suggested that the feature redundancy is significantly removed by ISIRS compared with STF (pvalue = 2.384e07 by paired Wilcoxtest). Moreover, if measured by MI, the means of MIs by ISIRS and STF for the 24 drugs are 0.0830 and 0.2042 respectively, also showing significant difference by paired Wilcoxtest (pvalue = 4.712e07). Details of the mutual information (MI) are shown in Table S4. All above results confirm that ISIRS could remove the redundancy between selected features.
Conclusion
Predicting drug response from genomic data including gene expression, mutation status of genes and copy number alteration is a very fundamental problem in research of personalized medicine. In this paper, we applied an iterative sure independence ranking and screening (ISIRS) to select the features. Through cross validation on the CCLE dataset, we reported that our method could not only find numerous biomarkers which were reported in previous literatures, but also detect many marginally weak yet biologically important genomic features. These new detected features are shown to have strong combination effects on drug response. Based on the selected features, we performed lasso regression model to predict the drug response on CCLE dataset. The Pearson correlation coefficients between predicted and true drug sensitivities showed that our arithmetic got much higher correlations than ENR, ISIS, STF and SIRS. In the future, we plan to make an available webserver to implement the prediction method ISIRS in the paper.
Availability of data and materials
All data analyzed during this study are included in Dataset files.
Dataset file 1: Drug response data in CCLE. Dataset file 2: Mutation status data in CCLE.
Dataset file 3: Copy number alteration data in CCLE. Dataset file 4: Gene expression data in CCLE.
Abbreviations
 CCLE:

Cancer Cell Line Encyclopedia
 ISIRS:

Iterative sure independent ranking and screening
 ALL:

Acute lymphoblastic leukemia
 ISIS:

Iterative sure independence screening
 SIRS:

Sure independent ranking and screening
 STF:

Simple top features
 OLS:

Ordinary least squares
 PCC:

Pearson correlation coefficient
 MSE:

Mean squared errors
 MRS:

Mean redundancy score
 MI:

Mutual information
References
 1.
Pal R, Berlow N, Haider S. Anticancer drug sensitivity analysis: an integrated approach applied to erlotinib sensitivity prediction in the ccle database. Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS). 2012;9–12.
 2.
Staunton JE, Slonim DK, Coller HA, Tamayo P, Angelo MJ, Park J, Scherf U, Lee JK, Reinhold WO, Weinstein JN, et al. Chemosensitivity prediction by transcriptional profiling. Proc Natl Acad Sci U S A. 2001;98(19):10787–92.
 3.
Riddick G, Song H, Ahn S, Walling J, BorgesRivera D, Zhang W, Fine HA. Predicting in vitro drug sensitivity using random forests. Bioinformatics. 2011;27(2):220–4.
 4.
Lee JK, Havaleshko DM, Cho H, Weinstein JN, Kaldjian EP, Karpovich J, Grimshaw A, Theodorescu D. A strategy for predicting the chemosensitivity of human cancers and its application to drug discovery. Proc Natl Acad Sci U S A. 2007;104(32):13086–91.
 5.
Holleman A, Cheok MH, den Boer ML, Yang W, Veerman AJ, Kazemier KM, Pei D, Cheng C, Pui CH, Relling MV. Geneexpression patterns in drugresistant acute lymphoblastic leukemia cells and response to treatment. N Engl J Med. 2004;351(6):533–42.
 6.
Shen L, Kondo Y, Ahmed S, Boumber Y, Konishi K, Guo Y, Chen X, Vilaythong JN, Issa JP. Drug sensitivity prediction by CpG island methylation profile in the NCI60 cancer cell line panel. Cancer Res. 2007;67(23):11335–43.
 7.
Chen J, Peng H, Han G, Cai H, Cai J. HOGMMNC: a higher order graph matching with multiple network constraints model for genedrug regulatory modules identification. Bioinformatics. 2019;35(4):602–10.
 8.
Wang Y, Wang Z, Xu J, Li J, Li S, Zhang M, Yang D. Systematic identification of noncoding pharmacogenomic landscape in cancer. Nat Commun. 2018;9(1):3192.
 9.
Menden MP, Iorio F, Garnett M, McDermott U, Benes CH, Ballester PJ, SaezRodriguez J. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS One. 2013;8(4):e61318.
 10.
Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehar J, Kryukov GV, Sonkin D, et al. The Cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483(7391):603–7.
 11.
Fang Y, Qin Y, Zhang N, Wang J, Wang H, Zheng X. DISIS: prediction of drug response through an iterative sure independence screening. PLoS One. 2015;10(3):e0120408.
 12.
Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. J Royal Statistical Society: Series B (Statistical Methodology). 2008;70(5):849–911.
 13.
Zhu L, Li L, Li R, Zhu L. Modelfree feature screening for ultrahigh dimensional data. J Am Stat Assoc. 2011;106(496):1464–75.
 14.
Sebaugh JL. Guidelines for accurate EC50/IC50 estimation. Pharm Stat. 2011;10(2):128–34.
 15.
Chen W, Lin H, Feng PM, Ding C, Zuo YC, Chou KC. iNucPhysChem: a sequencebased predictor for identifying nucleosomes via physicochemical properties. PLoS One. 2012;7(10):e47843.
 16.
Yao J, Chang C, Salmi ML, Hung YS, Loraine A, Roux SJ. Genomescale cluster analysis of replicated microarrays using shrinkage correlation coefficient. BMC bioinformatics. 2008;9:288.
 17.
Zoppoli G, Regairaz M, Leo E, Reinhold WC, Varma S, Ballestrero A, Doroshow JH, Pommier Y. Putative DNA/RNA helicase Schlafen11 (SLFN11) sensitizes cancer cells to DNAdamaging agents. Proc Natl Acad Sci. 2012;109(37):15030–5.
 18.
Wu S, Guo Z, Hopkins CD, Wei N, Chu E, Wipf P, Schmitz JC. Biscyclopropane analog of disorazole C1 is a microtubuledestabilizing agent active in ABCB1overexpressing human colon cancer cells. Oncotarget. 2015;6(38):40866.
 19.
Martinelli G, Iacobucci I, Storlazzi CT, Vignetti M, Paoloni F, Cilloni D, Soverini S, Vitale A, Chiaretti S, Cimino G. IKZF1 (Ikaros) deletions in BCRABL1–positive acute lymphoblastic leukemia are associated with short diseasefree survival and high rate of cumulative incidence of relapse: a GIMEMA AL WP report. J Clin Oncol. 2009;27(31):5202–7.
 20.
Sheu JJ, Guan B, Choi JH, Lin A, Lee CH, Hsiao YT, Wang TL, Tsai FJ, Shih Ie M. Rsf1, a chromatin remodeling protein, induces DNA damage and promotes genomic instability. J Biol Chem. 2010;285(49):38260–9.
 21.
Kamei T, Matozaki T, Sakisaka T, Kodama A, Yokoyama S, Peng YF, Nakano K, Takaishi K, Takai Y. Coendocytosis of cadherin and cmet coupled to disruption of cellcell adhesion in MDCK cellsregulation by rho, Rac and Rab small G proteins. Oncogene. 1999;18(48):6776–84.
 22.
Yu MG, Zheng HY. Acute myeloid leukemia: advancements in diagnosis and treatment. Chin Med J. 2017;130(2):211–8.
 23.
Zhou YZB, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, Benner C, Chanda SK. Metascape provides a biologistoriented resource for the analysis of systemslevel datasets. Nat Commun. 2019;10(1):1253.
 24.
Kozak W, Dasko M, Maslyk M, Kubinski K, Rachon J, Demkowicz S. Steroid Sulfatase inhibitors based on phosphate and Thiophosphate flavone analogs. Drug Dev Res. 2015;76(8):450–62.
 25.
Yu R, Li Z, Zhang C, Song H, Deng M, Sun L, Xu L, Che X, Hu X, Qu X, et al. Elevated limbbud and heart development (LBH) expression indicates poor prognosis and promotes gastric cancer cell proliferation and invasion via upregulating integrin/FAK/Akt pathway. PeerJ. 2019;7:e6885.
 26.
Santidrian AF, MatsunoYagi A, Ritland M, Seo BB, LeBoeuf SE, Gay LJ, Yagi T, FeldingHabermann B. Mitochondrial complex I activity and NAD+/NADH balance regulate breast cancer progression. J Clin Invest. 2013;123(3):1068–81.
 27.
Ghadban T, Jessen A, Reeh M, Dibbern JL, Mahner S, Mueller V, Wellner UF, Gungor C, Izbicki JR, Vashist YK. In vitro study comparing the efficacy of the watersoluble HSP90 inhibitors, 17AEPGA and 17DMAG, with that of the nonwatersoluble HSP90 inhibitor, 17AAG, in breast cancer cell lines. Int J Mol Med. 2016;38(4):1296–302.
 28.
DeRemer DL, Ustun C, Natarajan K. Nilotinib: a secondgeneration tyrosine kinase inhibitor for the treatment of chronic myelogenous leukemia. Clin Ther. 2008;30(11):1956–75.
 29.
Ren X, Wang Y, Chen L, Zhang XS, Jin Q. EllipsoidFN: a tool for identifying a heterogeneous set of cancer biomarkers based on gene expressions. Nucleic Acids Res. 2012;41(4):e53.
 30.
Peng H, Long F, Ding C. Feature selection based on mutual information criteria of maxdependency, maxrelevance, and minredundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27(8):1226–38.
Acknowledgements
Not applicable.
About this supplement
This article has been published as part of BMC Medical Informatics and Decision Making Volume 20 Supplement 8, 2020: Selected articles from the 15th International Symposium on Bioinformatics Research and Applications (ISBRA19): medical informatics and decision making. The full contents of the supplement are available online at https://0bmcmedinformdecismakbiomedcentralcom.brum.beds.ac.uk/articles/supplements/volume20supplement8
Funding
This work was partially supported by the National Natural Science Foundation of China (61702325), National Key R&D Program Projects (2018YFD0701003) and Shanghai Science and Technology Innovation Action Plan (16391902900). Publication costs were funded by the National Natural Science Foundation of China (61702325).
Author information
Affiliations
Contributions
BA conducted experiments, performed data analysis, wrote paper; QWZ assembled input data, wrote paper; YF designed study; MC conceived study; YFQ designed study, analyzed results. All authors read and approved of the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1: Table S1
The selected variables and ω_{k} values for 24 drugs.
Additional file 2: Table S2
Person correlation coefficients between true and predicted drug responses by ISIRS, ENR, ISIS, SIRS and STF.
Additional file 3: Table S3
The mean redundancy score measured by PCC for the 24 drugs by ISIS, STF and ISIRS.
Additional file 4: Table S4
The mean redundancy score measured by MI for the 24 drugs by ISIS, STF and ISIRS.
Additional file 5: Fig. S1
: Pearson correlation coefficients of predicted and true drug sensitivities at different numbers of recruited features for the other 20 drugs.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
An, B., Zhang, Q., Fang, Y. et al. Iterative sure independent ranking and screening for drug response prediction. BMC Med Inform Decis Mak 20, 224 (2020). https://0doiorg.brum.beds.ac.uk/10.1186/s12911020012409
Received:
Accepted:
Published:
DOI: https://0doiorg.brum.beds.ac.uk/10.1186/s12911020012409
Keywords
 SIRS
 Drug response
 ISIRS
 CCLE