Dirichlet process mixture models to impute missing predictor data in counterfactual prediction models: an application to predict optimal type 2 diabetes therapy

Cardoso, Pedro; Dennis, John M.; Bowden, Jack; Shields, Beverley M.; McKinley, Trevelyan J.

doi:10.1186/s12911-023-02400-3

BMC Medical Informatics and Decision Making

Table 1 Dirichlet process mixture models (DPMMs)

From: Dirichlet process mixture models to impute missing predictor data in counterfactual prediction models: an application to predict optimal type 2 diabetes therapy

A DPMM is commonly used as a prior distribution over the components of a (possibly multivariate) mixture model of unknown complexity [16]. A DPMM consists of a theoretically infinite number of components, where each component is parameterised by a specific functional form with component-specific parameters. Instead of fitting an infinite number of components, we use a truncated DPMM with a maximum number of components K, based on the assumption the optimal number of components is lower than the set limit [17]. In turn, the computational demands for fitting the model are reduced. The value of K should increase as the complexity of the data increases, but its suitability can be checked post model fit.
The DPMM thus defines a weighted sum of K component densities [18]. The component densities are restricted to particular parametric classes of densities that are assumed to be appropriate for the data at hand. We define \(f_k\left( \textbf{X} \mid \varvec{\Theta }_k \right)\) as the \(k^{\text{ th }}\) component density, with \(\varvec{\Theta }_k\) representing the component parameters. A K component mixture density is defined as:

where \(p_k\) are component-specific weights such that \(\sum \nolimits _{k = 1}^K p_k = 1\) [19].
For \(J_{C}\) continuous predictors, we use mixtures of multivariate Gaussian distributions with \(J_C\) dimensions, the cluster-specific parameters for component k (\(k = 1,\dots ,K\)) are given by \(\left( \varvec{\mu } _{k}, {\varvec{\Sigma }} _{k}\right)\), where \(\varvec{\mu }_{k}\) is a \(J_C\)-vector of means and \(\varvec{\Sigma }_{k}\) is a \((J_C \times J_C)\) covariance matrix. For \(J_D\) categorical predictors, we use mixtures of categorical probability mass functions, where the number of categories for a covariate j (\(j = 1,\dots ,J_D\)) is \(K_{j}\), the component-specific parameters are the probabilities of belonging to each category, given by \(\varvec{\phi }_{k} = (\varvec{\phi }_{k1}, \varvec{\phi }_{k2}, ..., \varvec{\phi }_{kJ_D})\) with \(\varvec{\phi }_{kj}=(\phi _{kj1},\phi _{kj2},\dots ,\phi _{kjK_{j}})\) and \(\sum \nolimits _{l = 1}^{K_j} \phi _{kjl} = 1\). The model in this paper is given as a mixture of continuous and categorical variables, since \(\textbf{X}_{i} = \left( \textbf{X}^{C}_{i},\textbf{X}^{D}_{i}, X^T_i\right)\), with \(\textbf{X}^{C}_{i}\) representing the continuous predictors, \(\textbf{X}^{D}_{i}\) corresponding to the categorical predictors and \(\textbf{X}^{T}_{i}\) representing the treatment taken. Hence in the notation of the Study overview section, \(\varvec{\Theta } = \left( \varvec{\Theta }_1, \dots , \varvec{\Theta }_K, p_1, \dots , p_K\right)\), where the component-specific parameters are given by \(\varvec{\Theta }_{k} = (\varvec{\mu }_{k}, {\varvec{\Sigma }}_{k}, \varvec{\phi }_{k})\).
A latent variable, \(Z_{i} = 1, \dots , K\), is used to assign individual data points to different components of the mixture model, and we assume independence between continuous and categorical components conditional on the cluster allocations [11, 12]. Thus the probability density for individual i, given \(Z_i\) is:

More details on the component densities, prior distributions, and how to sample from the DPMM are given in the Supplementary Materials.

Back to article page

ISSN: 1472-6947

Contact us

General enquiries: journalsubmissions@springernature.com