Abstract
To begin to build an understanding of the interactions of CYP2B6 with substrates, two different 3-dimensional quantitative structure activity relationship (3D-QSAR) models were constructed using 16 substrates of B-lymphoblastoid expressed CYP2B6. A pharmacophore model was built using the program Catalyst, which was compared with a partial least-squares (PLS) model using molecular surface-weighted holistic invariant molecular (MS-WHIM) descriptors. The Catalyst model yielded a 3-dimensional model of the common structural features of CYP2B6 substrates, whereas PLS MS-WHIM generated a model based on statistical analyses of molecular descriptors for size and shape of the substrate. The pharmacophore model obtained with Catalyst consisted of three hydrophobes and one hydrogen bond acceptor region. The cross-validated PLS MS-WHIM model gave a good q2 value of 0.607. Size, positive electrostatic potential, hydrogen bonding acceptor capacity, and hydrophobicity were found to be the most relevant descriptors for the model. These models were then used to predict the Km (apparent) values of a test set of structurally diverse substrates for CYP2B6 not included in the model building, specifically lidocaine, amitriptyline, bupropion, arteether, and verapamil. Overall, both 3D-QSAR methods yielded satisfactory Km (apparent) value predictions for the majority of the molecules in the test set. However, PLS MS-WHIM was unable to reliably predict theKm (apparent) value for verapamil, whereas Catalyst did not predict the Km (apparent)value for lidocaine. In both of these cases the residual of theKm (apparent) value was greater than one log unit. The strengths and limitations of both of these 3D-QSAR approaches are discussed.
An understanding of the substrate specificity of the cytochrome P-450 (CYP) enzymes is essential due to their clinically important role in the metabolism of xenobiotics and endobiotics (Wrighton and Stevens, 1992). There is no consensus regarding a 3-dimensional (3D) structure for any membrane-bound mammalian CYP active/binding site. Thus, our understanding of the structures of mammalian CYPs are limited to homology models built using bacterial CYPs as a template (Lewis and Lake, 1997) or models built and inferred from known inhibitor binding (Strobl et al., 1993). Many pharmacological studies have resolved receptor active/binding sites using numerous computational 3D-quantitative structure activity relationship (3D-QSAR) techniques (Koehler et al., 1996). These methods utilize relevant conformers of ligands to suggest functional groups, the geometry of structural features, and regions of electrostatic and steric interactions essential for activity or fit to the receptor binding/active site. The overall combination and 3D spatial distribution of physicochemical properties, the functional groups of the ligands, and a measure of binding site properties of an enzyme, such as theKm (apparent) (Nelsestuen and Martinez, 1997), define the pharmacophore. 3D-QSAR methods can also be applied to modeling CYP substrates for the binding/active site(s). Very few CYP binding/active site models have been constructed which utilize enzyme kinetic values for the site in question; these in turn have dealt solely with inhibitory interactions (Strobl et al., 1993; Jones et al., 1996). In contrast, there has been some application of QSAR for oxidation by CYP which was correlated with hydrophobic and steric parameters of substrates (Hansch and Zhang, 1993; Gao and Hansch, 1996). These same authors discussed the value of usingKm and kcat and reported that the majority of QSAR studies using these parameters resulted in a better than chance correlation (Gao and Hansch, 1996). Therefore, hydrophobic and steric properties were found to be necessary requirements for CYP mediated metabolism. Molecular electron substituent effects in the oxidation of CYPs were found to oppose enzyme-substrate complex formation and were not necessary; however, this may vary for each CYP (Gao and Hansch, 1996).
It has previously been reported that although CYP2B6 represents a small percentage of total human hepatic P-450 (<0.2%) (Shimada et al., 1994), its ability to catalyze the oxidation of xenobiotics may have been underestimated (Ekins et al., 1997, 1998). Many examples of xenobiotics metabolized by CYP2B6 have been identified (Ekins et al., 1998) and in some cases apparent kinetic parameters are also reported. This kinetic information would allow modeling of common features required of substrates for CYP2B6 (and indirectly provide some details of the active site) if the Km (apparent)values encompassed a sufficient range in affinity for the enzyme in question.
As yet there have been no studies describing absolute structural requirements for substrates or inhibitors nor their interactions with the amino acids in the active site of CYP2B6. A homology model of rat CYP2B1 aligned upon CYPBM-3 has been proposed and the amino acids that interact with substrates have been identified. Based on this homology modeling work with CYP2B1, most substrates for this enzyme appear to be involved in a π-stacking arrangement with side chains of Phe-181 and/or Phe-263. These amino acids were aligned in the same positions in homology modeled CYP2B6 (Lewis and Lake, 1997), implying their importance in substrate binding. However, caution should be observed in extrapolating the CYP2B1 homology model to CYP2B6 as these CYPs are only 75% similar by protein sequence (Wolf et al., 1988) and, as observed with other CYPs, species differences in metabolic capability are likely to occur. A clear example of this is cocaineN-demethylation which although mediated by rat CYP2B1 and rabbit CYP2B4 (Lewis and Lake, 1997) is not catalyzed by human CYP2B6 (Poet et al., 1996).
In the current study we describe two 3D-QSAR models for CYP2B6 substrates using Km (apparent) values generated with B-lymphoblastoid expressed CYP2B6. These 3D-QSAR models were then used to predict theKm (apparent) values of a test set of five CYP2B6 substrates not present in the training set. The predictions made by both models were then compared with the experimental values ofKm (apparent) derived by ourselves or others; this enabled a residual to be calculated. Both models generated in this study may be used in the future to predict substrate affinity of additional molecules. In addition, the pharmacophore model will eventually allow database searching and construction of a receptor site model (Hahn and Rogers, 1995) to confirm features essential for substrate binding in the active site of CYP2B6.
Materials and Methods
Materials.
NADPH, lidocaine, amitriptyline, and nortriptyline were purchased from Sigma Chemical Co. (St. Louis, MO). Bupropion, verapamil, and norverapamil were obtained from Research Biochemicals International (Natick, MA). Triethylamine was purchased from Aldrich Chemical Co. (Milwaukee, WI). Monoethylglycinexylidide (MEGX) was a kind gift from Dr. Peter Olinga (Groningen University, The Netherlands). Methanol and acetonitrile were purchased from Burdick and Jackson (Muskegan, MI). Microsomes prepared from human B-lymphoblastoid cells expressing vector only and human B-lymphoblastoid cell lines expressing CYP2B6 were purchased from Gentest Corp. (Woburn, MA).
Molecular Modeling.
The computational molecular modeling studies were carried out using Silicon Graphics Indigo and Onyx workstations.
Modeling with Catalyst.
The 3D structures of substrates were built interactively using either Catalyst version 2.3 or 3.1 (Molecular Simulations, San Diego, CA). The number of conformers generated for each substrate was limited to a maximum of 255 with an energy range of 20 kcal/mol. Ten hypotheses were generated using conformers for the 16 molecules in the training set and theKm (apparent) values (Table1) after manual selection of the hydrogen bond donor, hydrogen bond acceptor, hydrophobic, and negative ionizable features. After assessing all 10 hypotheses generated, the lowest energy cost hypothesis was considered the best.
Km(apparent) for metabolism of substrates for expressed CYP2B6
The goodness of the structure activity correlation was estimated by means of r. Statistical significance of the retrieved hypothesis was verified by permuting the response variable, i.e., the activities of the training set compounds were mixed a number of times (so that each value was no longer assigned to the original molecule) and the Catalyst hypothesis generation procedure was repeated.
Modeling with Molecular Surface-Weighted Holistic Invariant Molecular and Partial Least-Squares.
Each molecule was coded in a simplified molecular input line entry system string format (Weininger, 1988). Atomic 3D coordinates and Gasteiger-Huckel charges were generated by CONCORD 3.2.1 (CONCORD user’s manual, Tripos Inc., St. Louis, MO). Molecular surface-weighted holistic invariant molecular (MS-WHIM), descriptors were computed using the program EL3DMD (Bravi et al., 1997; Bravi and Wikel, 1998a,b). MS-WHIM descriptors are a set of statistical parameters which contain information about molecular structure in terms of size, shape, symmetry, and distribution of molecular surface point coordinates after weighted centering and principal component analysis. The following weights are applied: unweighted value, positive electrostatic potential, negative electrostatic potential, hydrogen bond acceptor, hydrogen bond donor capacity, and hydrophobicity; these weights yield a total of 72 descriptors [see Bravi et al. (1997) for details on MS-WHIM].
Partial least-squares (PLS) (Wold et al., 1993) was applied to correlate MS-WHIM descriptors with observedKm (apparent) data of the training set structures. Because variance associated with different descriptors can be very different from one another, MS-WHIM descriptions were autoscaled to assign unit variance to each descriptor. Activity data were transformed as log(1/Km). The optimum number of components in each PLS model generated was determined through two cross-validation procedures: leave-one-out (LOO) and five random groups repeated up to 100 times (5RGx100) (Baroni et al., 1993). Within the latter protocol, training set molecules are randomly assigned to five groups. Since results are strongly influenced by initial random assignment, the entire cross-validation procedure is repeated many times to achieve a meaningful statistical result. The predictive power of the PLS model was evaluated by means of the cross validated r2 (also known asq2) and standard deviation of error of prediction (Baroni et al., 1993) which are computed as follows:
Prediction with the Test Set Molecules.
The test set of five molecules (lidocaine, amitriptyline, bupropion, verapamil, and arteether) not included in the initial training set of 16 molecules, were best-fit to the best Catalyst hypothesis in order to predict theKm (apparent) value. Theq2 value was used to select the best PLS MS-WHIM model and was successively used to predict theKm (apparent) value of test set molecules. PLS also provides for a rapid evaluation of how test set molecules are structurally similar to the training set, hence how predictions can be considered reliable. This method is based on a comparison between the unexplained variance (snew2) of the molecule to be predicted and the unexplained variance of the training set (s02) by means of an approximate Fisher’s exact test (Wold et al., 1993).
Lidocaine N-Deethylation Assay.
Initial rate conditions for the formation of MEGX from lidocaine were determined in preliminary studies with microsomes prepared from cell lines expressing CYP2B6. Microsomes (0.4 mg/ml) were preincubated in plastic Eppendorf tubes for 3 min at 37°C with a final concentration of 100 mM sodium phosphate buffer (pH 7.4) and 1 mM NADPH in a 250-μl final volume. Reactions were initiated by addition of lidocaine (0.25–500 μM), terminated after 60 min by adding 25 μl of 70% perchloric acid and then placed on ice. After centrifugation, a 50-μl aliquot of supernatant was analyzed for MEGX by high-performance liquid chromatography (HPLC).
The reversed-phase HPLC method utilized a mobile phase consisting of water and acetonitrile 70/30 v/v containing 20 mM sodium perchlorate, pH 2.5, at a flow rate of 2 ml/min. The HPLC system (Shimadzu, Kyoto, Japan) utilized an ultraviolet detector at 205 nm and a Nucleosil C18 5-μm, 250 × 4.6 mm column (Metachem Technologies Inc., Torrance, CA) with a Nucleosil C18 5-μm, safeguard cartridge guard column. Retention times of MEGX and lidocaine were 3.3 and 4.5 min, respectively.
Amitriptyline N-Demethylation Assay.
Initial rate conditions for the formation of nortriptyline from amitriptyline were determined in preliminary studies with microsomes prepared from cell lines expressing CYP2B6. Microsomes (0.4 mg/ml) were preincubated as described above in a 250-μl final volume. Reactions were initiated by addition of amitriptyline (5–1000 μM), terminated after 90 min by adding 125 μl of 20% trichloroacetic acid and then placed on ice. After centrifugation, a 50-μl aliquot of supernatant was analyzed for nortriptyline by HPLC.
The reversed-phase HPLC method utilized a mobile phase consisting of water and acetonitrile 70/30 v/v containing 1% triethylamine pH 3 at flow rate of 2 ml/min. The HPLC system (Shimadzu, Kyoto, Japan) utilized an ultraviolet detector at 220 nm and a Monochrom C18 5-μm, 150 × 4.6 mm column (Metachem Technologies Inc., Torrance, CA). Retention times of nortriptyline and amitriptyline were 5.7 and 6.5 min, respectively.
Bupropion Hydroxylase Assay.
Initial rate conditions for the formation of hydroxybupropion were determined in preliminary studies with microsomes prepared from cell lines expressing CYP2B6. Microsomes (0.6 mg/ml) were preincubated as described above in a 250-μl final volume. Reactions were initiated by addition of bupropion (10–1000 μM), terminated after 180 min by adding 125 μl of acetonitrile and then placed on ice. After centrifugation, a 25-μl aliquot of supernatant was analyzed by HPLC.
The reversed-phase HPLC method utilized a mobile phase A consisting of 0.1% formic acid and 0.01% trifluoroacetic acid in distilled water, whereas mobile phase B was 100% acetonitrile. At a flow rate of 2 ml/min, the percentage of mobile phase B changed from 10% at 0 min to 40% at 5 to 11 min; it then returned to 10% at 12 min for 1 min. The HPLC system (Shimadzu, Kyoto, Japan) utilized an ultraviolet detector at 248 nm and a Monochrom C18 5-μm, 150 × 4.6 mm column (Metachem Technologies Inc., Torrance, CA). Retention times of the major metabolite, putatively hydroxybupropion and bupropion were 4.7 and 9 min, respectively.
Hydroxybupropion was identified by liquid chromatography-mass spectroscopy using the same HPLC conditions as described above, except the flow rate was 1 ml/min and split 1:4 (with 250 μl into the electrospray and 750 μl to waste). A 10-μl volume of sample was injected. A Finnigan TSQ-7000 was used in positive ion mode with 5 V spray voltage; the capillary heater was set at 250°C. Nitrogen was used for both the sheath and auxiliary gas at 80 psi and 20 ml/min, respectively. Argon was the collision gas at 1.6 mTorr, while the collision energy was −25eV. Full scan (Q1) detection was performed atm/z 100 to 500 at 1 s. Two peaks were apparent atm/z 256 (6.21 min) and m/z 240 (7.02 min), respectively, in incubations of bupropion with lymphoblastoid expressed CYP2B6. The peak at m/z 240 corresponded to the standard bupropion. The product ion spectrum for this peak was identical with that observed for the standard bupropion. The peak at m/z256 corresponded to a hydroxylated metabolite of bupropion. The product ion spectrum showed ions at m/z 55, 71, 131, 139, 166, 184, and 237/239. These ions all indicated that this peak is related to bupropion. Additionally, the ion at m/z 237/239 showed the loss of water (18 amu) from m/z 256, whereas the ions atm/z 55 and 71 indicated the hydroxyl to be present on the side chain of the molecule rather than the benzyl ring. A second much smaller peak was observed at m/z 256 corresponding to a different hydroxylated metabolite of bupropion. The product ion mass spectrum of this peak showed ions at m/z 57, 147, 182, and 199; however, there is no ion representing a loss of water. The ion atm/z 57 indicated that there was no modification of the side chain. The ions at m/z 147 and 182 are 16 amu larger than the ions observed in standard bupropion at m/z 131 and 166, respectively. These ions indicated that the hydroxyl function may be located on the benzyl ring. Neither of the hydroxylated metabolites were present in incubations in the absence of NADPH. These observations strongly suggest that lymphoblastoid expressed CYP2B6 preferentially hydroxylated bupropion on the side chain rather than the benzene ring under the conditions used in this study.
Verapamil Metabolism.
Initial rate conditions for the formation of verapamil metabolites were determined in preliminary studies with microsomes prepared from cell lines expressing CYP2B6. Microsomes (0.4 mg/ml) were preincubated as described above in a 250-μl final volume. Reactions were initiated by addition of verapamil (5–1000 μM), terminated after 180 min by adding 125 μl of acetonitrile, and then placed on ice. After centrifugation, a 5-μl aliquot of supernatant was analyzed by HPLC.
The reversed-phase HPLC method utilized a mobile phase A consisting of 0.1% formic acid and 0.01% trifluoroacetic acid, whereas mobile phase B was 100% acetonitrile. At a flow rate of 2 ml/min, the percentage of mobile phase B changed linearly from 25% at 0 min to 30% at 19 min; then, it increased to 100% at 20 min and returned to 25% at 25 min. The HPLC system (Shimadzu, Kyoto, Japan) utilized a fluorescence detector at 280-nm excitation and 310-nm emission and a YMC basic 5-μm, 150 × 4.6 mm column (YMC Inc., Wilmington, NC). Retention times of the major metabolite and verapamil were 12.9 and 19 min, respectively.
O-Desmethyl verapamil was identified by liquid chromatography-mass spectroscopy using the same HPLC conditions as described above except the flow rate was 1 ml/min and split 1:4 (with 250 μl into the electrospray and 750 μl to waste). An injection volume of 50 μl was utilized. A Finnigan TSQ-700 was used in positive ion mode with 5-V spray voltage; the capillary heater was set at 250°C. Nitrogen was both the sheath and auxiliary gas at 80 psi and 20 ml/min, respectively. Argon was the collision gas at 1.5 mTorr, while the collision energy was −25eV. Full scan (Q1) detection was performed at m/z 200 (275)-650 at 1 s.
A standard mixture of verapamil and norverapamil was injected and analyzed by LC/MS and LC/MS/MS. The mass chromatograms of the protonated ions at m/z 455 (verapamil) and m/z441 (norverapamil) showed retention times of 12.24 and 12.22 min, respectively. Analysis of both standards by LC/MS/MS showed similar fragmentation. Prominent ions at m/z 165, 260, and 303 for verapamil and m/z 165, 260, and 289 for norverapamil were observed in the product ion spectra. Analysis of a sample mixture incubated in the presence of NADPH showed potential metabolite peaks atm/z 441 (11.09 min) and m/z 471 (11.35 min) after LC/MS analysis. Product ion mass spectra of these two peaks were obtained by LC/MS/MS, which showed a peak at m/z 441. This peak had prominent fragment ions at m/z 165, 246, and 289. These ions indicate the peak is a demethylated metabolite of verapamil but its earlier retention time and the presence of an ion atm/z 246 suggest it is different from that of the norverapamil standard, hence it is likely to be O-desmethyl verapamil. The product ion mass spectrum of the peak at m/z471 did not give enough information to determine if the peak is a metabolite of verapamil.
Kinetic Evaluation and Interpretation.
Kinetic analyses of verapamil O-demethylation, amitriptylineN-demethylation, bupropion hydroxylation, and lidocaineN-deethylation were initially assessed by visual examination of Eadie-Hofstee plots to determine whether Michaelis-Menten enzyme kinetics were observed or whether an allosteric activation mechanism was apparent (Enzyme Kinetics 1.6; Window Chem Software Inc., Fairfield, CA). The estimates for the kinetic parameters from these analyses were used as initial estimates for nonlinear regression analyses (NONLIN, version VO2-G-VAX; Statistical Consultants, Inc., Lexington, KY) fitting the data to Michaelis-Menten or Hill equations and generatingKm (apparent), Vmax (apparent), and, when appropriate, n (number of sites bound by activator)(Segel, 1993). The best fit to a particular model was determined by examination of (in order of importance) the randomness of the residuals, the size of the residuals, the standard error of the parameter estimates (first between models, then within a model using the best fit weighted model), and finally, when necessary, the difference between sums of squares using Fisher’s exact test (Boxenbaum et al., 1974).
Results
Catalyst CYP2B6 Substrate Pharmacophore.
Catalyst is a commercially available program that enables pharmacophore construction by using a collection of molecules with affinities over a number of orders of magnitude for the enzyme binding site. The pharmacophore hypotheses produced by Catalyst explain the variability of the affinity of the substrates for the active site with respect to the geometric localization of features present in the molecules used to build it.
The previously reported Km (apparent)values for 16 substrates of CYP2B6 (Table 1) enabled the construction of a pharmacophore for the active site of this enzyme (Fig.1) using Catalyst. Multiple conformers of all 16 substrates were used to generate this pharmacophore, whereKm (apparent) values ranged over four orders of magnitude (Table 1). The lowest energy pharmacophore produced four features suggested to be necessary for binding to the active site, namely, three hydrophobes and one hydrogen bond acceptor (Fig. 1) whose 3D coordinates are also reported (Table2). The three hydrophobic regions present in the pharmacophore were located at distances of 5.3, 3.1, and 4.6 Å from the hydrogen bond acceptor, and had intermediate angles of 72.8 and 67.6O (Fig. 1). The substrate 7-ethoxy-4-trifluoromethylcoumarin has a relatively lowKm for CYP2B6 and demonstrates structural groups that correspond to all of the pharmacophore features (Fig. 2), resulting in a good estimation for the Km value (estimatedKm = 1.8 μM; observedKm = 1.7 μM). In contrast, the higherKm substrate S-mephenytoin has structural groups that fitted fewer of the pharmacophore features (Fig.3). Nonetheless, theKm value for S-mephenytoin was predicted reasonably well by the model (estimatedKm = 480 μM; observedKm = 564 μM).
The Catalyst-produced CYP2B6 substrate pharmacophore (See Materials and Methods) illustrating three hydrophobic areas (cyan) and a hydrogen bond acceptor feature (green) with a vector in the direction of the putative hydrogen bond. The inter bond angles and the distances between each hydrophobe and the hydrogen bond acceptor are also annotated.
The three-dimensional coordinates of Catalyst pharmacophore features for the CYP2B6 substrate model
The Catalyst-produced CYP2B6 substrate pharmacophore with 7-ethoxy-4-trifluoromethylcoumarin (see Materials and Methods) illustrating three hydrophobic areas (cyan) and a hydrogen bond acceptor feature (green).
The Catalyst-produced CYP2B6 substrate pharmacophore with S-mephenytoin (see Materials and Methods) illustrating three hydrophobic areas (cyan) and a hydrogen bond acceptor feature (green).
The Catalyst pharmacophore demonstrated a strong correlation of observed versus estimated Km values (r of 0.85, Fig. 4). The total energy cost of the generated pharmacophore can be calculated as the deviation between the estimated activity and the observed activity, combined with the complexity of the hypothesis (the number of features). One can also calculate a null hypothesis, which presumes that there is no relationship in the data and that experimental activities are normally distributed about their mean. Hence, the greater the difference between the energy cost of the generated hypothesis and the energy cost of the null hypothesis, the less likely it is that the hypothesis reflects a chance correlation. For the model generated here, the total energy cost of the hypothesis (arbitrary units) was 75.9, whereas the energy cost of the null hypotheses was 94.2, respectively. To investigate whether the difference between these two hypotheses (18.3) is statistically relevant, theKm values were permuted several times so that the new activity vector would have a null correlation. Unexpectedly, Catalyst was still able to generate hypotheses with the same pharmacophoric features, and the total cost of the hypothesis was similar to that of the original hypothesis. On the basis of these results, a difference of 18.3 units appears to be too low to suggest a conclusive quantitative model, although it may be of qualitative value.
Correlation of Catalyst-produced estimatedKm (apparent) values with the observedKm (apparent) values generated with CYP2B6 (see Materials and Methods). The central line corresponds to the regression for the data, the dashed lines represent the 95% confidence interval for the regression while the outer lines are the 95% confidence interval for the population.
PLS MS-WHIM CYP2B6 Substrate Model.
It is generally accepted that receptors and substrates recognize each other at their molecular surface (Wagener et al., 1995). Therefore, the binding of a ligand depends on the shape of its surface and on the distribution of certain properties (e.g., electrostatic potential) on this surface. MS-WHIM modeling approaches attempt to capture this information and condense it into a brief numerical vector which can be used for 3D-QSAR purposes. This MS-WHIM approach has been successfully applied to study structure-property and structure-activity problems (Bravi et al., 1997;Bravi and Wikel, 1998a,b).
MS-WHIM-autoscaled descriptors were calculated for each substrate as previously described (Bravi et al., 1997) and correlated by means of PLS with Km data expressed as log(1/Km). The model was then cross-validated using LOO and 5RGx100 protocols to yield aq2. Cramer et al. (1993) have demonstrated that a positive value for q2 greater than 0.3 indicates that the model is meaningful (predictive), whereas aq2 value of 1 is optimal. The model selected yielded the highest LOO q2 and included the following molecular surface properties: unitary value (unweighted), positive electrostatic potential, hydrogen bonding acceptor capacity, and hydrophobicity. Forty-eight MS-WHIM descriptors based on the above properties yielded a model with a LOOq2 value of 0.674 with 8 components. When applying the more realistic 5RGx100 the q2dropped to 0.607 with 8 components and a standard deviation of 0.079. The difference between LOO and 5RGx100 results most likely is related to the low number of molecules included in the training set. LOO requires only one molecule to be left out of the model compared with 5RGx100 protocol which successively removes 20% of the molecules from the training set and predicts theirKm (apparent). The 5RGx100 is a less optimistic approach toward predicting activity than LOO and represents a more credible cross-validation protocol. Standard deviation of error of predictions are directly related to the uncertainty of predictions and gives an estimation of the confidence limits of the data. Standard deviation of error of predictions values of 0.612 (LOO) and 0.669 (5RGx100, with a standard deviation of 0.066) were obtained for this model. A plot of experimental versus PLS MS-WHIM-predicted activities for all 16 substrates demonstrated that all but four substrates lie within the 95% confidence interval for this line fitted by linear regression (Fig. 5). This describes the range where the regression line values will fall. However, more importantly, only one substrate prediction lies outside of the 95% confidence interval for the population, which defines the overall scatter of the data. Random permuting of the response variable a number of times (so that the Km (apparent) values no longer corresponded to the same molecules) was adopted to test the validity of the obtained model, as this procedure should not result in a predictive model. In each case a q2 of zero or lower was obtained, indicating that permuting the response variable does not result in a predictive PLS MS-WHIM model. This approach confirms that the cross validated PLS MS-WHIM 3D-QSAR model obtained for CYP2B6 is statistically valid.
Correlation of PLS MS-WHIM-produced predictedKm (apparent) values with the observedKm (apparent) values generated with CYP2B6 (see Materials and Methods). The central line corresponds to the regression for the data. The dashed lines represent the 95% confidence interval for the regression, whereas the outer lines represent the 95% confidence interval for the population.
Test Set Predicted Km (apparent)and Observed Km (apparent).
After constructing the Catalyst and PLS MS-WHIM 3D-QSAR models from the training set of substrates, we were able to use them to predict theKm (apparent) values for a test set of molecules not included in the training set. TheKm (apparent) values were predicted for a number of CYP2B6-mediated biotransformation pathways including verapamil O-demethylation, bupropion hydroxylation, amitriptyline N-demethylation and lidocaineN-deethylation. These predicted values were then compared with the corresponding values generated using B-lymphoblastoid expressed CYP2B6 (see below). Recently, arteether was identified as substrate for CYP2B6 and kinetic parameters were calculated after using B-lymphoblastoid expressed CYP2B6 (Grace et al., 1998). TheKm (apparent) for arteether biotransformation was also predicted using both the Catalyst and PLS MS-WHIM 3D-QSAR models for CYP2B6.
The kinetic parameters for biotransformation of four of the molecules present in the test set (lidocaine N-deethylation, amitriptyline N-demethylation, bupropion hydroxylation, and verapamil O-demethylation) were all determined using microsomes from B-lymphoblastoid cells expressing CYP2B6. These incubations were carried out as previously described (Ekins et al., 1998) and utilized 100 mM phosphate buffer, which is most frequently used for those molecules included in the training set (Table 1). In all cases the contribution of endogenous CYP activity present in the microsomes used, obtained from B-lymphoblastoid control cells incorporating the vector, was negligible.
Apart from arteether, the apparent kinetic parameters for the substrates described in Table 3 have not been previously published. The data obtained for the biotransformation of these substrates, except for verapamil O-demethylation, best fit the single enzyme Michaelis-Menten model. In this case the results with the O-desmethyl verapamil best fit the Hill equation (Segel, 1993), with the number of sites bound by compound(n) equaling 1.1 (Table 3).
Observed kinetic parameters and predictedKm(apparent) parameters for metabolism of substrates for CYP2B6 best fit to the Catalyst and PLS MS-WHIM CYP2B6 substrate models
Table 3 also reports predictedKm (apparent) values for these biotransformations obtained using the Catalyst-generated pharmacophore and the PLS model based on MS-WHIM descriptors. Catalyst was able to satisfactorily predict arteether, bupropion, verapamil, and amitriptyline with residuals lower than one log unit (Table 3). However, it failed to predict theKm (apparent) for lidocaine as the residual was greater than one log unit. This poor prediction is possibly due to a sterically unfavorable interaction with the active site of CYP2B6 as the aromatic moiety of lidocaine lies beneath the plane of the hydrogen bond acceptor (Fig.6).
The Catalyst-produced CYP2B6 substrate pharmacophore with lidocaine (see Materials and Methods) illustrating three hydrophobic areas (cyan) and a hydrogen bond acceptor feature (green).
The PLS MS-WHIM model yielded satisfactory predictions for four of five of the test set molecules. Verapamil was estimated to have a higher affinity for CYP2B6 according to PLS MS-WHIM than was observed with expressed CYP2B6. (This was overestimated by the model.) The Fisher’s exact test for describing structural similarity to the training set when conducted on verapamil revealed that on the basis of MS-WHIM descriptors, the structure of verapamil is dissimilar to the training set molecules (Table 3). Hence the prediction for verapamil by the PLS MS-WHIM model should be considered unreliable.
Discussion
It is possible to define the substrate specificity of the active site of an unknown enzyme by building a pharmacophore that utilizes a number of molecules and their affinities for this site. This approach provides complementary information to modeling the substrate binding site for the CYPs, as no crystal structure for a mammalian CYP has been identified. There have been several attempts to build pharmacophores for a CYP by manual overlapping of substrates for CYP2D6 (de Groot et al., 1997 and references therein) and CYP2C9 (Jones et al., 1996). These along with other indirect QSAR modeling techniques have been used to gain some limited understanding regarding the regio- and stereoselective properties and active site geometries of these enzymes. The present study has described two different 3D-QSAR analyses performed on a training set of 16 CYP2B6 substrates. The creation of these models enabled prediction of theKm(apparent) for a test set of five CYP2B6 substrates, which were excluded from the training set.
These two computational approaches, Catalyst and PLS MS-WHIM, are two quite different modeling approaches. Within Catalyst, molecules are described in terms of pharmacophoric features, i.e., hydrogen bond donor or acceptor atoms, hydrophobic moieties, and positively or negatively charged groups. The final result is represented by the 3D spatial disposition of essential features for the activity. The advantages of Catalyst are in the ease of visual interpretation of the results, providing complementary information about the active site, relative estimations of activities of a test set, and the potential use of the pharmacophore for either molecule design or 3D database searches. On the other hand, Catalyst attempts to produce a structure-activity correlation considering only the geometrical disposition of structural features. This is something of an oversimplification, as a number of other physicochemical properties and the 3D geometrical disposition of pharmacophoric features possessed by the molecule may also affect the biological activity.
In comparison to Catalyst, MS-WHIM descriptors are aimed at capturing global 3D chemical information at the molecular surface level. The shape and the electronic properties of a molecule are characterized in 3D space and condensed into a brief numerical vector. A structure-activity relationship is derived by means of PLS and the final model is presented in the form of a linear equation. The advantages of this approach are in the high speed of computation and high degree of automation of MS-WHIM descriptors, of which there are fewer than required for comparative molecular field analysis. Moreover, the MS-WHIM approach provides for a global molecular description with results invariant to molecular rotation and translation. Hence, molecules do not need to be aligned before MS-WHIM computation. This is the main advantage of “global” descriptors over “local” descriptors such as those generated by comparative molecular field analysis, whose results are strictly dependent on how molecules have been aligned and superimposed. On the other hand, the mathematical procedure that underlies the calculation of MS-WHIM descriptors makes their physical interpretation almost impossible. Therefore, a 3D display of the PLS MS-WHIM model is not available. As a consequence, MS-WHIM-based models can be solely used to predict the activity of unknown molecules.
In the present study both 3D-QSAR approaches yielded models that were then subjected to validation procedures. The Catalyst pharmacophore consisted of three hydrophobic features that are likely to interact with a hydrophobic binding domain within the active site of CYP2B6. In addition, this pharmacophore possesses a hydrogen bond acceptor feature with a vector representing the projected point location of the target hydrogen bond donor (Fig. 1). However, Catalyst pharmacophore hypotheses were developed even after using permutedKm (apparent) values. This suggests that the CYP2B6 pharmacophore generated is suboptimal. There have been few published reports utilizing Catalyst (Kaminski et al., 1997) and of these none have commented upon the required size of data sets, the hypothesis cost differential compared to the null hypothesis, or statistical analyses necessary to produce valid pharmacophores. It is therefore difficult to put the Catalyst pharmacophore observations completely into context, although it does provide a starting point for future comparisons.
The best PLS MS-WHIM model was obtained using the following molecular surface properties: unitary value (unweighted), positive electrostatic potential, hydrogen bonding acceptor capacity, and hydrophobicity. Based upon statistical significance, the model produced by PLS MS-WHIM appears to be more robust than the Catalyst pharmacophore. Theq2 obtained by means of two cross-validation techniques, LOO and 5RGx100, are meaningful as they generate positive values of q2 > 0.6, 2-fold higher than the accepted value for a meaningful model (Cramer et al., 1993). To confirm that this model does not reflect a chance correlation, the response variables were permuted several times and cross-validation was repeated. In each case, after permuting the response variables, null or even negativeq2 values were obtained indicating that the models generated were not meaningful.
A prospective study using the models generated by both 3D-QSAR methods enabled the prediction of theKm (apparent) of substrates in the test set. Catalyst succeeded in predicting arteether, verapamil, amitriptyline, and bupropion Km (apparent)values as the residuals from the observed value were all less than one log unit. However, Catalyst failed in predicting lidocaine which was predicted as having a lower Km (apparent)by 3.69 log units. A potential reason for this failure is that Catalyst considers a molecule active if it possesses or orients in the proper manner all of the essential features necessary for activity. Other factors could contribute to such an error such as a molecular volume which is not represented in the training set, leading to van der Waals interactions with the enzyme active site. The latter point would appear to be the case for lidocaine. This molecule was predicted to be very active because it was able to closely match all four pharmacophore features. However, the phenyl ring of lidocaine (Fig. 6) occupies a region in space that was not represented in the molecules present in the training set and may negatively interact with the active site of CYP2B6. This is a known limitation of Catalyst, and we can conclude that this molecule is therefore too dissimilar to the training set compounds in order to consider its prediction reliable using this technique.
The PLS MS-WHIM model was very successful in predicting lidocaine, arteether, amitriptyline, and bupropion with residuals lower than one log unit. PLS MS-WHIM poorly predicted one molecule, verapamil, which was predicted as having a lowerKm (apparent) by 3.7 log units. However, the Fisher’s exact test highlighted that this compound is dissimilar to the training set compounds. The F-value is considerably higher with respect to the well predicted test set compounds (Table 3). Therefore, the PLS MS-WHIM-predictedKm (apparent) value for verapamil was considered unreliable.
To our knowledge, Catalyst has not been previously compared with other 3D-QSAR techniques such as MS-WHIM. In this study, activity predictions obtained with Catalyst had larger residuals than the predictions generated from the PLS MS-WHIM model. This difference between the two 3D-QSAR models may be centered around reliance on the geometric disposition of pharmacophore features for Catalyst and our selection of the first of 10 hypotheses, rather than the multiple shape and electronic descriptors used in the PLS MS-WHIM technique. The first Catalyst hypothesis was considered the best as the second and subsequent hypotheses were closer to the null and therefore not considered. We are aware of the limited variance of the substrateKm (apparent) values present in the test set (3.3–4.5 molar log units) and concentration around the mean (3.9 molar log units). This is close to the training set meanKm (apparent) value for this model (4.1 molar log units), which covers a much larger range ofKm (apparent) values (1.7–5.9 molar log units). Further molecules need to be included in the test set to increase the range of activity and confidence in the predictive nature of the models. Even though the Catalyst pharmacophore model may be less quantitative than the PLS MS-WHIM model, it may be of qualitative value as it provides a visual representation of molecular features necessary for substrates to fit in the active site of CYP2B6. The reasoning behind our choice of 1 log unit residual as our cutoff for predictions was due to the variability of much of the published in vitro derived kinetic data, which can vary significantly (>1 log) between laboratories for the same biotransformation. Therefore, we feel that predicting within a 1 log unit residual and producing an approximate rank order of the activity may be as useful as predicting in vitro data with greater precision.
In order to prospectively generate a test set for our 3D-QSAR models the present study identified a number of substrates for B-lymphoblastoid expressed CYP2B6 which are also substrates for other human CYPs. The first example is the local anesthetic and antiarrhythmic lidocaine which has been widely recognized as a substrate for CYP3A4 in human liver microsomes. Recently, CYP2B6 has been implicated in lidocaine N-deethylation to form MEGX (Imaoka et al., 1996). The current report illustrated that CYP2B6 has a lower Km (apparent) for lidocaine (537.6 μM) compared to that described for the CYP3A4 mediated highKm (apparent) component (1388–1927 μM)(Bargetzi et al., 1989). The finding that amitriptyline is a substrate for lymphoblastoid-expressed CYP2B6 (Table 1) is not unexpected since the N-demethylation of another tricyclic antidepressant, imipramine, is catalyzed by CYP2B6 (Koyama et al., 1997). Interestingly, the Km (apparent)for amitriptyline is approximately half the value reported for theKm (apparent) of imipramine. The current study suggests that although both tricyclics differ in the saturation of only one aliphatic C-C bond and the addition of a nitrogen atom, this can have a marked effect on substrate affinity for CYP2B6. The physiological role of CYP2B6 with regard to amitriptylineN-demethylation is questionable since itsKm (144 μM) is substantially greater than that (20 μM) of an implicated unidentified CYP (Gharamani et al., 1997). The calcium channel antagonist, verapamil, has been reported to be metabolized to norverapamil by CYP3A with CYP1A2 also participating in vitro (Kroemer et al., 1992). Interestingly, the current study suggests that CYP2B6 is responsible for formation ofO-desmethyl verapamil metabolite as opposed to theN-demethylated metabolite. VerapamilO-demethylation demonstrated non-Michaelis-Menten kinetics, which were then fitted to the Hill equation (Table 3), as described previously for 7-ethoxy-4-trifluoromethycoumarin (Ekins et al., 1997) and testosterone (Ekins et al., 1998). In agreement with a previous study, lymphoblastoid-expressed CYP2B6 was also the major CYP responsible for bupropion hydroxylation (Wurm et al., 1996), although no Km (apparent) value had been generated previously.
Future work will involve further developing the Catalyst-produced CYP2B6 model. In addition, this same pharmacophore can also be used to search chemical databases to identify and obtain substrate probes for CYP2B6. These are all potential directions that would enable further understanding of CYP2B6 and are made possible by the 3D nature of the Catalyst-generated pharmacophore. In the meantime, the PLS MS-WHIM model is preferred to the Catalyst model, based on cross-validation procedures for the prediction ofKm (apparent) values for CYP2B6 substrates.
Acknowledgments
We thank Dr. Peter Olinga (Groningen University, The Netherlands) for the kind gift of MEGX, Dr. James Grace (Walter Reed Army Institute of Research, Washington, DC) for providing a preprint of his paper on arteether metabolism, and Robert Coner (Eli Lilly and Co.) for software support. Dr. David Cummins (Eli Lilly and Co.) is acknowledged for his help in implementation of the Fisher’s exact test in the PLS algorithm, and Dr. Patrick J. Murphy is thanked for his initial encouragement in pursuing this direction.
Footnotes
-
Send reprint requests to: Steven A. Wrighton, Ph.D., Department of Drug Disposition, Lilly Research Laboratories, Eli Lilly and Co., Lilly Corporate Center, Drop Code 0825, Indianapolis, IN 46203.
-
↵1 Present address: Central Research Division, Pfizer Inc., Groton, CT 06340.
-
↵2 Present address: Glaxo Wellcome Research and Development, Medicines Research Center, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, United Kingdom.
- Abbreviations:
- CYP
- cytochrome P-450
- 3D-QSAR
- 3-dimensional quantitative structure activity relationship
- LOO
- leave one out
- MS-WHIM
- molecular surface-weighted holistic invariant molecular
- PLS
- partial least squares
- 5RGx100
- five random groups repeated up to 100 times
- 3D
- 3-dimensional
- MEGX
- monoethylglycinexylidide
- HPLC
- high-performance liquid chromatography, LC/MS, liquid chromatography/mass spectroscopy
- LC/MS/MS
- liquid chromatography/mass spectroscopy/mass spectroscopy
- Received March 26, 1998.
- Accepted August 26, 1998.
- The American Society for Pharmacology and Experimental Therapeutics