Abstract
CYP2A13 is a human cytochrome P450 monooxygenase that is efficient in the metabolic activation of tobacco-specific nitrosamines. Sequence variations that affect CYP2A13 expression may contribute to interindividual differences in susceptibility to tobacco-related tumorigenesis. The aim of this study was to identify any impact of CYP2A13 single-nucleotide polymorphisms (SNPs) on CYP2A13 expression in human lung. Expression levels of CYP2A13 mRNA in normal lung displayed significant interindividual variation (>50-fold). Preliminary sequence analysis of CYP2A13 RNA-polymerase chain reaction (PCR) products suggested that a 7520C > G variation, located in the 3′-untranslated region, could be associated with low transcript abundance. Subsequently, we developed a method for the measurement of relative allelic expression, by taking advantage of the capability for melting-curve analysis in real-time PCR. Quantitative analyses using this method indicated that transcripts from the 7520G-containing alleles were >10-fold less abundant than those from the 7520C-containing alleles in 14 of 16 samples examined. The frequencies of the 7520C > G variation in anonymous White, African American, Hispanic, and Asian newborns from New York State were found to be 5.2, 26.8, 17.7, and 4.3%, respectively. The 7520C > G SNP was previously known to be present in both CYP2A13*1H and *3 alleles. However, analyses of SNP distribution indicated that, in 15 of the 16 heterozygous DNA samples, the 7520C > G SNP belonged to new CYP2A13*1 haplotypes. These findings provide a basis for further studies that associate CYP2A13 haplotypes with incidences of smoking-related lung tumors and for studies on the mechanisms of the low-expression phenotype of the 7520G-containing allele.
CYP2A13, a cytochrome P450 (P450) monooxygenase, is believed to play an important role in tobacco-related tumorigenesis in the respiratory tract (Su et al., 2000; Wang et al., 2003). Heterologously expressed CYP2A13 is highly effective in the metabolic activation of a major tobacco-specific carcinogen, 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK), with a catalytic efficiency much greater than that of other human P450s examined to date (Su et al., 2000). CYP2A13 also participates in the metabolic activation of other chemical carcinogens, such as 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanol (Jalas et al., 2003), N-nitrosodiethylamine, and hexamethylphosphoramide (Su et al., 2000). The CYP2A13 gene, which is located in the CYP2 gene cluster on human chromosome 19 (Fernandez-Salguero et al., 1995; Hoffman et al., 2001), is selectively expressed in the respiratory tract (Koskela et al., 1999; Su et al., 2000; Chen et al., 2003).
CYP2A13 genetic polymorphisms may be associated with interindividual differences in susceptibility to tobacco-related tumorigenesis, because the resultant variations in CYP2A13 expression and metabolic activity can significantly alter the extent of NNK metabolic activation in human lung. To date, more than 20 single-nucleotide polymorphisms (SNPs) have been identified in CYP2A13 (Zhang et al., 2002, 2003; Fujieda et al., 2003; Saito et al., 2003; Cauffiez et al., 2004), but only two of these, an Arg101Stop mutation in exon 2, and an Arg257Cys variation in exon 5, are known to have functional consequences. The Arg101Stop mutation, which is relatively infrequent (Saito et al., 2003; Zhang et al., 2003; Cauffiez et al., 2004), represents a null allele, and the Arg257Cys variation, with allele frequencies in various ethnic groups ranging from 0 to 14% (Wang et al., 2003; Zhang et al., 2003; Cauffiez et al., 2004; Cheng et al., 2004), leads to ∼50% decreases in metabolic activities toward all substrates examined (Zhang et al., 2002). The significance of the Arg257Cys SNP in lung cancer risk has been demonstrated by a recent epidemiological study, which found that the 257Cys allele is associated with substantially reduced risks for smoking-related lung adenocarcinoma in a Chinese population, especially for light smokers (Wang et al., 2003). The Arg101Stop mutation, on the other hand, seems to be associated with an increased risk for small cell lung carcinoma (Cauffiez et al., 2004).
In the present study, we determined the potential impact of selected CYP2A13 SNPs on levels of CYP2A13 expression in human lung. We first determined the level of total CYP2A13 mRNA in 18 normal lung biopsy tissues to assess the extent of interindividual variation. To avoid potential confounding by sample-to-sample differences in the quality and yield of RNA preparations, or by interindividual differences in trans-acting factors that influence the expression of CYP2A13, we determined the relative expression of variant and wild-type (WT) CYP2A13 alleles in RNA samples from heterozygotes. A strategy combining single-nucleotide primer extension, capillary electrophoresis, and fluorescent di-deoxy terminator-based sequence analysis has been used in several recent studies for quantitative analysis of allelic variations in gene expression (Cowles et al., 2002; Matyas et al., 2002; Yan et al., 2002; Bray et al., 2003). However, because we did not have access to the necessary instrumentation, we developed an alternative, real-time PCR-based method for determining relative allelic expression in individuals heterozygous for CYP2A13 exon SNPs. Here we describe this method, and its successful application to the functional characterization of CYP2A13 alleles containing a 7520C > G variation located in the 3′-untranslated region (3′-UTR).
Materials and Methods
Preparation of DNA, RNA, and First-Strand cDNA. DNA and RNA were isolated from resected normal lung biopsy or autopsy tissues (provided by the National Cancer Institute Cooperative Human Tissue Network, National Cancer Institute) and from human fetal autopsy tissues (provided by the University of Washington Birth Defects Research Laboratory, Seattle, WA). DNA was also prepared from newborn blood spots in the New York State Newborn Screening Program. DNA samples used for the identification of individuals heterozygous for the 7520C > G variation were isolated using the DNeasy tissue kit (QIAGEN, Valencia, CA). DNA samples used for determining the 7520C > G variant allele frequencies were prepared from newborn blood spots using a protocol described previously (Caggana et al., 1998; Sheng et al., 2000). Total RNA was isolated from normal lung tissues with use of TRIzol (Invitrogen, Carlsbad, CA). First-strand cDNA was synthesized from 4 μg of total RNA in a total volume of 40 μl, using the SuperScript III first-strand synthesis system and an Oligo(dT)20 primer (Invitrogen). This study was approved by the Institutional Review Board of the New York State Department of Health.
Quantitative Analysis of CYP2A13 mRNA Level in Normal Lung Tissues. Real-time reverse transcription-polymerase chain reaction (RT-PCR) was carried out using a LightCycler instrument (Roche Diagnostics, Mannheim, Germany) and a LightCycler FastStart DNA Master SYBR Green I kit (Roche Applied Science, Indianapolis, IN). The mRNA levels were determined for CYP2A13 and for two housekeeping genes: TATA box binding protein (TBP) and β-actin. The primers used were 2A13-F (5′-acctggtgatgaccaccc-3′, located in exon 6) and 2A13-R (5′-cgtggatcactgcctctg-3′, located in exon 7); TBP-F (5′-gcacaggagccaagagtgaa-3′, located at the junction of exons 5 and 6) and TBP-R (5′-tcacagctccccaccatatt-3′, in exon 6; modified from Lossos et al., 2003), according to the sequence of GenBank accession no. NM_003194); and β-actin-F (5′-cctgactgactacctcatg-3′, in exon 3) and β-actin-R (5′-tccttctgcatcctgtcggca-3′, in exon 4). The sizes of expected PCR products were 204 bp for CYP2A13, 127 bp for TBP, and 396 bp for β-actin. PCR was carried out in a reaction volume of 10 μl. Reaction mixtures for quantification of CYP2A13 mRNA contained 1 μl of RT product, 2.3 mM MgCl2, 0.3 μM of each primer, and 1× LightCycler FastStart Reaction Mix SYBR Green I (Master mix). Reaction mixtures for the detection of TBP mRNA contained 2 μl of 10-fold-diluted RT products, 4 mM MgCl2, 0.4 μM of each primer, and 1× Master mix. Reaction mixtures for β-actin mRNA quantification contained 1 μl of 10-fold-diluted RT products, 3 mM MgCl2, 0.4 μM of each primer, and 1× Master mix. Reactions were initiated with denaturation at 95°C for 8 min, followed by 40 to 50 cycles of amplification (for CYP2A13, 50 cycles of 95°C for 5 s, 66°C for 8 s, 72°C for 8 s; for TBP, 40 cycles of 95°C for 5 s, 66°C for 8 s, 72°C for 8 s; and for β-actin, 40 cycles of 95°C for 10 s, 61°C for 5 s, 72°C for 15 s) and a final extension at 72°C for 1 min. Fluorescence was monitored at 85°C for CYP2A13, at 79°C for TBP, and at 86°C for β-actin.
All quantitative PCR reactions were performed in duplicate. Each PCR run included the standards (10-fold serial dilutions of cloned cDNA for CYP2A13, and 10-fold serial dilutions of gel-purified RT-PCR products for TBP and β-actin), a no-template control, and the first-strand cDNA samples. The data were evaluated with the Roche LightCycler Run 5.32 software, with use of the Fit Points method. Standard curves were generated from a minimum of four data points by plotting Ct (threshold cycles) against the log copy number of the templates. The relative levels of CYP2A13 mRNA in various total RNA preparations were normalized as number of copies of CYP2A13 cDNA per 1000 copies of TBP cDNA and per 100,000 copies of β-actin cDNA, in each RT product.
Determination of CYP2A13 mRNA Sequence at the Position That Corresponds to the 7520C > G SNP Site. The 3′-half (exon 6 to the 3′-UTR; 804 bp) of CYP2A13 mRNA was amplified by RT-PCR (1st RT-PCR) using primers 2A13F and Exon 9R2 (Zhang et al., 2003). PCR was carried out on the LightCycler. Reaction mixtures, in a total volume of 10 to 20 μl, contained 1 to 2 μl of RT product, 3.0 mM MgCl2, 0.5 μM of each primer, and 1× Master mix. After a preincubation at 95°C for 8 min, the reaction was carried out for 60 cycles, with each cycle consisting of a denaturation at 95°C for 15 s and an annealing/extension at 68°C for 50 s, followed by a final extension at 68°C for 1 min. Products from the 1st RT-PCR were used as template to further amplify (in a nested PCR) CYP2A13 mRNA sequences from the exon 9 coding region to the 3′-UTR (238 bp), with primers 2A13E9F-RT (5′-cttcaagtcccctcagtcg-3′) and 2A13R-RT1 (5′-tgttccctctaaccacctct-3′). The reaction mixtures, in a total volume of 10 μl, contained 1 μl of 10-fold-diluted 1st RT-PCR product, 3.0 mM MgCl2, 0.4 μM of each primer, and 1× Master mix. After predenaturation at 95°C for 8 min, reactions were carried out for 50 cycles (each cycle consisting of a denaturation at 95°C for 15 s, an annealing at 66°C for 10 s, and an extension at 72°C for 11 s), followed by a final extension at 72°C for 1 min. Each PCR run included a no-template control to detect potential contamination of reagents. The nested PCR products were gel purified using a QIAquick Gel Extraction Kit (QIAGEN) and then sequenced in both forward and reverse directions using the nested PCR primers.
Quantitative Analysis of Allele-Specific CYP2A13 Expression. The procedure for allele-specific mRNA quantification consisted of the following steps: first-strand cDNA synthesis, 1st RT-PCR, nested PCR, melting-curve analysis, measurement of the heights of the melting peaks corresponding to the WT and variant alleles; calculation of peak-height ratio between the two alleles; and transformation of peak-height ratio into template ratio between the two alleles, based on standard curves obtained in the same run. First-strand cDNA synthesis and 1st RT-PCR amplification of the 3′-half (exon 6 to the 3′-UTR) of the CYP2A13 mRNA sequence were carried out using the same method as described above, except that, for the 1st RT-PCR, 4 μl of RT product was included in a final volume of 20 μl. The nested PCR was carried out using the same primers as were used in the sequence analysis. PCR was carried out on the LightCycler using hybridization probes designed and synthesized by TIB MolBiol (Adelphia, NJ). The sequences of Anchor and Sensor probes (with the variation sites italicized) are as follows: Anchor probe, 5′-cctcccacaagccccgcccct-fluorescein-3′; Sensor [C] probe, 5′-LightCycler Red-640-ccccggccgtttccct-phosphate-3′ (for the WT allele); and Sensor [G] probe, 5′-LC-Red640-cccccgccgtttccct-phosphate-3′ (for the variant allele). The reaction mixtures, in a total volume of 10 μl, contained 1 μl of 10-fold-diluted 1st RT-PCR product, 2.5 mM MgCl2, 0.4 μM each primer, 0.3 μM Sensor [C] or Sensor [G] probe, 0.3 μM Anchor probe, and 1× LightCycler FastStart hybridization probe reagent mix (Roche Applied Science). The amplification conditions were the same as described in the sequence analysis section, except that fluorescence was monitored at 66°C. Standard curves were generated in reactions with known ratios of the WT and variant templates (1:10, 1:5, 1:3, 1:1, 3:1, 5:1, 10:1, as well as variant only and WT only), which were purified PCR products amplified from cloned CYP2A13 fragments. Each quantitative PCR run included six to nine standards, a no-template control, and reactions with 10-fold-diluted 1st RT-PCR products or genomic DNA as templates.
Melting-curve analysis was carried out, after the final extension in the nested PCR, under the following conditions: 96°C for 30 s; 45°C for 1 min; and then to 85°C with varying ramp speeds of 0.1 to 0.4°C/s. Several melting-curve analyses were performed for each sample to generate well formed peaks. The peak-height ratio between the two alleles for a given sample was used to calculate the ratio of the two templates, through reference to a standard curve of template ratios versus melting peak-height ratios.
Determination of Frequencies of the CYP2A13 7520C > G SNP. DNA samples examined in this section were anonymous human genomic DNA samples isolated from newborn blood spots. More than 40 random samples were obtained for each of the four major ethnic groups (White, African American, Hispanic, and Asian) in the New York State Newborn Screening Program. The 7520C > G variation was detected by PCR-restriction fragment length polymorphism (RFLP) analysis using primers 2A13E8F (5′-actcctccatgcctgccactcc-3′) and 2A13R2 (5′-tgcctgcacatgatcacaaacatgcg-3′), which amplify a 1972-bp fragment (exon 8 to the 3′-UTR). PCR was carried out in a Thermal Cycler 9600 (Applied Biosystems, Foster City, CA), in a total volume of 25 μl, containing 10 μl of an aqueous solution of the genomic DNA from a blood spot (from a total of ∼30 μl), 2 mM MgSO4, 60 mM Tris-SO4 (pH 9.1), 18 mM (NH4)2SO4, 200 μM each dNTP, 0.2 μM each primer, and 0.5 μl of an elongase enzyme mix (Invitrogen). After initial denaturation at 94°C for 1 min, 40 cycles of amplification, each consisting of denaturation at 94°C for 30 s, annealing at 64.5°C for 45 s, and extension at 68°C for 3 min, were carried out. Final extension was at 68°C for 10 min.
RFLP analysis was initiated by digestion of the 1972-bp PCR product with the restriction enzyme EagI, which cuts the 7520C allele into two bands (823 and 1149 bp), but does not cut the 7520G allele. Reaction mixtures contained 5 units of EagI, 1× NE buffer 3 (New England Biolabs, Beverly, MA), and 3 to 5 μl of PCR product, in a total volume of 15 μl. The reaction was carried out at 37°C for 3 to 4 h, and the resulting DNA fragments were analyzed on a 1% agarose gel.
Detection of Additional CYP2A13 SNPs. SNPs in other parts of CYP2A13 were detected by the sequencing of PCR products obtained using CYP2A13-specific primers. A 2304-bp fragment corresponding to the 5′-flanking region of CYP2A13 was amplified by PCR from genomic DNA using primers 2A13up-2059F (5′-acatcagagcctgtcctgtgc-3′) and 2A13int1R (5′-ccacaaagccccagccaactg-3′), as described by Fujieda et al. (2003). CYP2A13 intron 7 to the 3′-UTR was amplified using primers 2A13E8F and 2A13R2, generating a 1972-bp fragment. Amplification of the CYP2A13 5′-flanking region to exon 5 was carried out using primers 5′-F1 and Exon 5R (Zhang et al., 2003), generating a 4592-bp fragment. All PCR products were gel-purified using a QIAquick Gel Extraction Kit (QIAGEN) before DNA sequencing.
Statistical Analysis. Correlation analysis, regression analysis, Mann-Whitney rank sum test, and χ2 analysis were performed using either Sigma Stat or Excel. The 95% confidence interval for the allele frequency was calculated, with a correction for continuity, using a program (http://faculty.vassar.edu/lowry/prop1.html) that is based on methods described by Newcombe (1998).
Results
CYP2A13 mRNA Levels Were Highly Variable in Adult Human Lung Tissue Samples. CYP2A13 mRNA levels were quantified using real-time RNA-PCR with gene-specific PCR primers in 18 resected normal lung biopsy samples. As shown in Table 1, the relative CYP2A13 mRNA levels, normalized by the abundance of TBP or β-actin transcripts in the same total RNA samples, were highly variable. TBP, which has no pseudogene (Vandesompele et al., 2002), was used, in addition to β-actin, to avoid possible confounding by the presence of processed β-actin pseudogene in contaminating genomic DNA, as illustrated by Hurteau and Spivack (2002). However, there was no evidence of significant DNA contamination in the RNA samples used in these experiments, as determined by monitoring of genomic DNA-derived PCR products (data not shown). Relative CYP2A13 mRNA levels calculated by comparison to the two different housekeeping genes were significantly correlated (r = 0.98; P < 0.01; analyzed using the Spearman rank order correlation test). Descriptive statistical analysis of two sets of data showed that the differences between the maximum and the minimum of CYP2A13 mRNA levels were >100-fold; the differences between the means of the highest four values and the means of the lowest four values were >50-fold; and the differences between the upper and lower quartiles were >6-fold. No correlation was found between the relative CYP2A13 mRNA levels and the patient age, gender, ethnicity, disease type, or the length of tissue processing time.
Patient demographic information, CYP2A13 genotype, and levels of CYP2A13 mRNA in human lung tissues Resected normal lung surgical biopsy tissues were used for preparation of DNA (for genotyping) and RNA (for quantitation of CYP2A13 mRNA level). CYP2A13 mRNA levels are normalized to those of two different housekeeping genes (TBP and β-actin).
The 18 tissue samples were chosen from a larger collection of adult lung tissues to represent individuals that are WT (A1–A7) or heterozygous (A8–A18) for the 3′-UTR 7520C > G variation. The relative CYP2A13 mRNA levels in samples A1–A6 were not significantly different from those in A8–A18 (p = 0.8; Mann-Whitney rank sum test). Sample A7 was not included in this comparison, because it was positive for the exon-5 3375C > T variation.
Qualitative Analysis of Expression Phenotype of the 7520G-containing Allele by Sequencing of RNA-PCR Products. A CYP2A13 mRNA fragment (from the exon-9 coding region to the 3′-UTR) was amplified using a nested RNA-PCR protocol, with RNA samples from 16 normal lung tissues, including six adult lung biopsy samples (A8–A13; Table 2), five perinatal autopsy lung samples (P1–P5; Table 3), and five fetal lung tissue samples (F1–F5; gestational days 101–115; two males and three females). All samples were heterozygous for the 7520C > G SNP, and WT for the exon 5 SNP site. Sequence analysis of purified RNA-PCR products indicated that, of the 16 lung samples, the CYP2A13 transcript derived from the 7520G allele was detected in only five samples (A8, A9, A12, F1, and F2; data not shown). Thus, it seemed that the variant allele was either not expressed or was expressed at much lower levels than the WT allele in the majority of the samples.
Relative transcript levels of the CYP2A13-7520C and CYP2A13-7520G alleles in adult lung tissues Total RNA from surgical biopsy lung tissues (A8–A13 in Table 1) was used for allele-specific RNA-PCR analysis as described in the legend to Fig. 3. All tissue samples were WT for the exon 5 SNP site but heterozygous for the 7520C > G variation. The copy number ratios of variant (7520G) to WT (7520C) mRNAs (V/W) were determined using both WT and variant probes.
Perinatal human lung tissue samples used for determining the relative transcript levels of the CYP2A13-7520C and CYP2A13-7520G alleles Autopsy lung tissues, provided by the Cooperative Human Tissue Network Pediatric Division, were obtained within 1 h of death. All tissue samples were WT for the exon 5 SNP site but heterozygous for the 7520C > G variation. The abundance ratios of variant (CYP2A13-7520G) to WT (CYP2A13-7520C) mRNAs were <0.10 for all samples.
Development of a Real-Time PCR-Based Method for Quantitative Analysis of Relative Expression of Variant Transcripts in Heterozygotes. We developed a real-time PCR-based method for measurement of relative allelic expression, by the combined use of allele-specific, fluorescently labeled oligonucleotide hybridization probes and melting-curve analysis. For a given hybridization probe (e.g., WT), the corresponding PCR product was detected as the peak with the higher melting temperature (Tm), whereas PCR product derived from the variant allele, having a single nucleotide mismatch with the probe, was detected as the peak with the lower Tm (Figs. 1A and 2A). Importantly, as shown in Figs. 1B and 2B, with the use of hybridization probes corresponding to either the WT allele or the variant allele, a significant linear correlation (r = 0.999; P < 0.01) was found between the melting peak-height ratios and the template concentration ratios of standard PCR products representing the two different alleles. The validity of this method was further confirmed in experiments using as templates genomic DNA from tissue samples known to be heterozygous for the CYP2A13 7520G allele. Of 10 different samples tested using this method, the ratio of variant allele over WT allele was 0.92 ± 0.12 when we used the WT probe, and 1.08 ± 0.20, when we used the variant probe, a result consistent with the expected 1:1 ratio of the two alleles in heterozygotes.
Melting-curve analysis of PCR products from the standards detected by the WT probe. A, melting curves using the WT 7520C probe. PCR was performed with a standard template corresponding to the WT (W) or the 7520G variant allele (V), or with both templates at V/W ratios of 3:1, 1:1, 1:3 (as indicated), and at 1:5 and 1:10 (data not shown). No template was added to control reactions (H2O). Melting curves were obtained for all samples, with selected examples shown. Melting peaks at the lower temperature (∼56°C for V alone, and for samples with both templates, shifted higher due to interference by the peak with the higher Tm) were formed by the 7520G allele, and the peaks at the higher temperature (∼64°C) were formed by the 7520C allele. B, correlation between ratios of maximum peak heights and ratios of known template concentrations (r = 0.999).
Melting-curve analysis of PCR products from the standards detected by the variant probe. A, melting curves using the variant 7520G probe. PCR was performed with a standard template corresponding to the WT (W) or the 7520G variant allele (V), or with both templates at V/W ratios of 3:1, 1:1, 1:3 (as indicated), and at 1:5 and 1:10 (data not shown). No template was added to control reactions (H2O). Melting curves were obtained for all samples, with selected examples shown. Melting peaks for the 7520C allele were at about 57 to 58°C, and those for the 7520G allele were ∼65°C. B, correlation between ratios of maximum peak heights and ratios of known template concentrations (r = 0.999).
Relative CYP2A13 Transcript Levels Determined by Allele-Specific, Quantitative RNA-PCR. The 16 RNA samples that had been used for qualitative analysis of expression phenotype were reanalyzed using the newly developed PCR method. As shown in Fig. 3 and Tables 2 and 3, results obtained with the two different probes are consistent. Furthermore, regardless of the tissue source (adult, perinatal, or fetal), the transcript derived from the 7520G-containing allele was much less abundant than that of the WT allele (7520C); the abundance ratios of variant to WT (V/W) mRNAs were <0.10 for 14 (including all perinatal and fetal samples and four adult samples) of the 16 samples tested. For the remaining two adult samples (Table 2), the V/W ratios were 0.12 and 0.24. We have confirmed that there were no sequence variations at the primer or probe binding sites in any of the 16 samples (data not shown).
Melting-curve analysis of RNA-PCR products. Representative results are shown for one human lung sample (P3) heterozygous for the 7520C > G SNP. PCR products were detected with the WT probe (A) or the variant probe (B). PCR product from the variant template standard (V) is included for comparisons. RNA-PCR was performed in quadruplicate. In both experiments, peaks corresponding to the variant allele were consistently not detected. C, melting-curve analysis of PCR products amplified from P3 genomic DNA. About equal amounts of the two alleles were detected, thus confirming heterozygosity of the sample.
Distribution and Frequencies of the CYP2A13 7520G-Containing Variant Allele in Four Major Ethnic Population Groups. A PCR-RFLP method was developed for genotyping of the 7520C > G SNP. The EagI restriction enzyme cuts PCR products from the 7520C allele, but it does not cut PCR products from the 7520G allele. Thus, as shown in Fig. 4, two bands were detected in homozygous WT samples, whereas three bands were detected in heterozygotes. The validity of the PCR-RFLP method was confirmed with DNA samples of known CYP2A13 genotype (data not shown).
Analysis of the 7520C > G SNP by PCR-RFLP. Genomic PCR products (1972 bp) were treated with EagI, which cuts the 7520C allele into two bands (823 and 1149 bp) but does not cut the 7520G allele. Lanes 1 and 2, two heterozygotes; lanes 3 and 4, two WT homozygotes. The approximate positions of selected fragments of a 1-kilobase DNA ladder are shown.
The frequencies of the 7520G-containing variant allele in four major ethnic groups are summarized in Table 4. The highest frequency was detected in African Americans (26.8%), followed by Hispanics (17.7%) and Whites (5.2%). The lowest allele frequency was detected in Asians (4.3%). The allele frequencies in African Americans and Hispanics were significantly higher than that in Whites and Asians (P ≤ 0.01). Moreover, three African American samples and one Hispanic sample were found to be homozygous for the variant allele. No significant frequency difference was detected between the male and female populations (male, 16.3%; female, 9.9%; χ2 = 2.756; P = 0.096). In addition, the observed genotype distribution did not deviate significantly from that expected according to Hardy-Weinberg equilibrium (χ2 = 0.436; P > 0.05).
Distribution and frequencies of the CYP2A13-7520C > G SNP in four major ethnic groupsa Allele frequency was determined with random samples of anonymous human genomic DNA isolated from newborn blood spots in the New York State Newborn Screening Program.
Distribution of CYP2A13 Variations in Human Lung Samples Used for mRNA Analysis. The 7520C > G variation was previously identified in two different CYP2A13 alleles, CYP2A13*1H and CYP2A13*3 (www.imm.ki.se/CYP-alleles/cyp2a13.htm), each of which contains many additional SNPs. To determine whether one or both of the two alleles (which have six known SNPs in common) confer the low-expression phenotype, and to rule out the involvement of at least some of these SNPs in the decreased CYP2A13 expression, we genotyped all 16 human lung samples that had been used for the expression analysis. We analyzed the CYP2A13 gene at all SNP sites reported for the *1H allele, and at most of the SNP sites reported for the *3 allele. We also sequenced the entire first 2 kilobases in the 5′-flanking region, to identify any new SNPs.
As shown in Table 5, 11 of the 16 samples analyzed had the same SNP profile, which, however, is not identical to that of either the *1H or the *3 allele. The SNP profiles for the other five samples all had subtle differences from one another, as well as from the subset of 11 identical samples. The insertion of ACC between nucleotides 1634 and 1635 (with predicted insertion of a T between amino acid residues 133 and 134) was not detected in any of the 16 samples, and the 1706C > G (Asp158Glu) variation was detected in only one of the 16 samples. The combination of the ACC insertion and the 1706C > G SNP is a unique feature of the *3 allele; their absence indicates that the individuals with the low-expression phenotype do not contain the *3 allele. On the other hand, although seven of the nine SNPs previously designated as being associated with the *1H allele (-1479T > C, -1240A > G, -411G > A, 1757A > G, 6424C > T, 7233T > G, and 7520C > G) were detected in at least 15 of the 16 samples analyzed, the other two SNPs (2366C > T and 6432C > T) were only detected in one and two of the 16 samples, respectively. Furthermore, the 2211T > C variation, previously identified in *3, but not in *1H, was detected in 13 of the 16 samples analyzed. In addition, 13 of the 16 samples had two SNPs (2537A > C and 2593G > A) in intron 4 that had not previously been associated with a CYP2A13 haplotype. In all, sample A10 seemed to be the only one that contained the authentic *1H allele, if we assume that all variant sequences detected in this sample were on the same chromosome. The other 15 samples all contained previously undescribed CYP2A13*1 haplotypes, which share many common SNPs with the *1H and *3 alleles. Two of these samples (A13 and P4) also contained previously unknown SNPs in the 5′-flanking region (Table 5). However, the sequences of these new *1 haplotypes (i.e., the linkages between the contributing SNPs) remain to be confirmed experimentally.
Distribution of CYP2A13 variations in human lung samples used for mRNA analysis The 16 human lung samples that were used for allele-specific quantification of relative CYP2A13 mRNA levels were further analyzed for additional CYP2A13 variations in the range of -2 kilobases to 3′-UTR. The -729C > T SNP and the ACC insertion between nucleotides 1634 and 1635 (in exon 3) were not detected in any of the DNA samples, whereas the 1757A > G (in intron 3, or I3) and 7520C > G (in 3′-UTR) SNPs were heterozygous in all 16 samples. The -2000C > G, -1638G > T, and -886T > C variations have not been reported previously.
A further analysis of the data in Table 5 suggested that none of the identified SNPs, other than 1757A > G and 7520C > G, could be the common variation in all 16 samples that led to the low-expression phenotype. Of the 18 SNPs detected, only four (-1479T > C, -1240A > G, 1757A > G, and 7520C > G) were detected in all 16 samples. However, sample A8 was homozygous for -1479T > C and -1240A > G, but heterozygous for 7520C > G, which makes it unlikely that these two 5′-variations are responsible for the decreased expression of the 7520G-containing allele.
Discussion
The functioning of a gene may be influenced by variations that directly affect its expression, as well as by variations that alter the structure and function of the encoded protein. It is currently difficult to identify variations that affect gene expression in human tissues, primarily because quantitative studies of gene expression in human tissues are often complicated by various factors, such as differences between comparison groups in patient medical history or specimen quality. Recently, several studies have been reported that illustrate the potential of a new strategy for studying allelic variations in human gene expression (Cowles et al., 2002; Matyas et al., 2002; Yan et al., 2002; Bray et al., 2003). In this approach, the relative expression levels of two alleles of a given gene are determined in the same RNA samples. Thus, we can avoid any impact of sample-to-sample differences in tissue preservation and processing, or of interindividual differences in trans-acting regulatory factors, on the detected relative transcript levels of the two alleles.
In the strategy used in previous studies for measuring relative allelic expression (Cowles et al., 2002; Matyas et al., 2002; Yan et al., 2002; Bray et al., 2003), the amplified RT-PCR products were differentially labeled by single-nucleotide primer extension, using fluorescent di-deoxy terminators; the labeled products were resolved by capillary electrophoresis, and detected by laser-induced fluorescence. In the present study, we have developed an alternative strategy, in which the two alleles are distinguished, after RT-PCR amplification using real-time PCR, by their differential hybridization to fluorescently labeled oligonucleotide probes that are designed to target the SNP site under investigation.
Real-time PCR is frequently used for quantitative analysis of gene expression at the mRNA level. Because of its capability for melting-curve analysis, real-time PCR is also useful for genotype analysis, in which PCR products derived from heterozygous alleles are detected as peaks with differing Tm. We noticed in our preliminary studies that, when melting-curve analysis is performed, the peak-height correlated to the amount of PCR product produced. Thus, the relative levels of PCR products from heterozygous alleles, produced using a common set of PCR primers, can be determined by comparing the ratio of the melting-peak heights of the two PCR products to those found for known amounts of standard DNA templates. Because mRNAs derived from two alleles that differ by only one or a few nucleotides are generally expected to be reverse-transcribed and amplified with similar efficiencies, the relative level of PCR products should be proportional to that of the transcripts derived from the two alleles in a given RNA sample.
The accuracy of our method for relative allelic expression analysis was confirmed for the 7520C > G SNP in studies using standard templates. In all studies, the correlation between the melting peak-height ratios and the corresponding template concentration ratios was highly significant. Template ratios determined for heterozygous genomic DNA samples, using either WT or variant probes, were also consistent with the expected 1:1 ratio of the two alleles. Performance of melting-curve analysis using both WT and variant probes increased the reliability of the results. Furthermore, all reactions were performed in duplicate, and a greater than usual amount of the RT products was used for the PCR step to minimize potential sampling errors, because the abundance of the CYP2A13 transcript is low.
It should be noted that the real-time PCR-based method for measurement of relative allelic expression may not work for all exon SNPs. Difficulties may arise if the sequences flanking the SNP site are not suitable for probe design, or if they contain additional polymorphisms that affect probe binding. Standard curves should be constructed for each SNP. In addition, small differences in transcript level (e.g., <20%) will be difficult to quantitate.
Few studies have examined the impact of SNPs on CYP expression in human lung. In the present study, using this newly developed method for measuring relative allelic expression, we discovered that CYP2A13 transcripts of the 7520C-containing allele were >10-fold more abundant than those from the 7520G-containing allele, in the lung tissue of 14/16 heterozygotes analyzed. To our knowledge, this is the first report of genetic polymorphisms that affect CYP2A13 expression, as well as the first application of the relative allelic expression approach for a direct and quantitative analysis of CYP expression in human tissues.
The anticipated near-absence of CYP2A13 transcripts in individuals homozygous for the 7520G allele would likely lead to dramatic decreases in CYP2A13 protein levels. Individuals homozygous for the 7520G allele were detected in the African American (3/41) and Hispanic (1/48) population groups, which had the highest allele frequencies for this variation (26.8 and 17.7%, respectively), although none of the lung samples analyzed was homozygous for this allele. A dramatic decrease in CYP2A13 protein levels will likely cause a significant decrease in the rate of NNK metabolic activation in the lung, and consequently a much-reduced susceptibility to development of smoking-induced lung adenocarcinoma. This prediction is based on a recent finding that the Arg257Cys variant allele (CYP2A13*2), which has a mere 50% decrease in metabolic activity (Zhang et al., 2002), was significantly associated with a substantial reduction in the risk of smoking-induced lung adenocarcinoma in a large-scale case-control study (Wang et al., 2003).
The maximal decrease in total CYP2A13 mRNA level resulting from the essential loss of expression of the 7520G-containing allele in heterozygous individuals would be ∼50%. However, the impact of such a decrease on total CYP2A13 mRNA levels in heterozygotes will be difficult to determine, because the expression of both 7520G and 7520C alleles are likely subject to additional influences by numerous trans-acting factors, such as medical history and environmental exposures, and genetic polymorphisms in the proteins that regulate the expression of CYP2A13 in the respiratory tract. Indeed, total CYP2A13 mRNA levels varied by >100-fold in both 7520C/C and 7520C/G groups. Therefore, the functional impact of the 7520G-containing haplotypes will be variable in heterozygotes.
The mechanisms underlying the association of the 7520C > G variation and decreased CYP2A13 expression remain to be determined. It is not clear whether the 7520C > G variation is responsible for the low transcript abundance of the variant allele, or whether it is only a marker variation. Analyses of the SNP distribution among the 16 samples heterozygous for the 7520C > G SNP seem to rule out all other known SNPs, except 1757A > G, in the coding, intronic, and 5′-flanking regions (up to -2 kilobases) as the common cause of low expression of the 7520G-containing allele in all samples examined. It will be helpful to learn whether the 7520C > G variation, which is located in the 3′-UTR, plays a role in the regulation of CYP2A13 mRNA stability. SNPs in the 3′-UTR of the human glycoprotein PC-1 gene (Frittitta et al., 2001), and a variant in the 3′-UTR of the human protein tyrosine phosphatase 1B gene (Di Paola et al., 2002), have been found to modify mRNA stability of the corresponding genes. It will also be interesting to determine whether the 1757A > G variation, which is located in intron 3, affects RNA splicing, or whether the low-expression phenotype is associated with a decreased rate of CYP2A13 transcription. Numerous instances have been reported in which SNPs affect splicing of a P450 transcript, such as the cases found for CYP1A2 (Allorge et al., 2003), CYP2C19 (de Morais et al., 1994; Ibeanu et al., 1999), and CYP3A5 (Kuehl et al., 2001), although the CYP2A13 1757A > G SNP does not occur at a known splicing site. Several examples have also been reported where SNPs in the promoter or enhancer region affect CYP expression, such as the cases found for CYP1A2 (Nakajima et al., 1999; Sachse et al., 1999; Aklillu et al., 2003), CYP2A6 (Pitarque et al., 2001, 2004; Kiyotani et al., 2003), and CYP2B6 (Lamba et al., 2003). Finally, it will be interesting to determine whether the low-expression phenotype of the CYP2A13 7520G-containing allele also occurs in the olfactory mucosa, where CYP2A13 is expressed at much higher levels than in the lung (Su et al., 2000). Significant interindividual differences in the level of CYP2A proteins have been observed in microsomes from fetal nasal mucosa (Gu et al., 2000), consistent with potential involvement of genetic polymorphisms that affect CYP2A13 expression.
Acknowledgments
We acknowledge the use of the Molecular Genetics Core and the Biochemistry Core of the Wadsworth Center. We thank Drs. Laurence Kaminsky and Adriana Verschoor for reading the manuscript. We also thank Cooperative Human Tissue Network Eastern Division at the University of Pennsylvania Medical Center (Philadelphia, PA), Cooperative Human Tissue Network Pediatric Division at the Children's Hospital Research Institute (Columbus, OH), and Dr. Alan Fantel of the Birth Defect Research Laboratory at the University of Washington (Seattle, WA) for providing human lung tissues.
Footnotes
-
This work was supported in part by Public Health Service Grant CA092596 (to X.D.) and HD00836 (to Dr. Alan Fantel of the Birth Defect Research Laboratory, University of Washington) from the National Institutes of Health.
-
Article, publication date, and citation information can be found at http://jpet.aspetjournals.org.
-
doi:10.1124/jpet.104.069872.
-
ABBREVIATIONS: P450, cytochrome P450; NNK, 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone; SNP, single-nucleotide polymorphism; UTR, untranslated region; RT, reverse transcription; PCR, polymerase chain reaction; TBP, TATA box binding protein; bp, base pair(s); WT, wild-type; RFLP, restriction fragment length polymorphism; Tm, melting temperature.
- Received April 12, 2004.
- Accepted June 1, 2004.
- The American Society for Pharmacology and Experimental Therapeutics