Structure- and Energy-based Analysis of Small Molecule Ligand Binding to Steroid Nuclear Receptors

The role of the classical steroid nuclear receptors in hormone-related cancer is of increasing interest as the picture is complex and at times appears conflicting. The functions of the various players by themselves and in relation to each other in cancer pathology are just beginning to be revealed. Exogenous ligands binding outside of the ligand binding pocket in the ligand binding domain are of interest because they have potential to bind generically to all receptors, resulting in cross-target effects or therapeutic potential. We study the binding of five such ligands in order to ascertain the potential for such effects. In particular, we propose a contact and energy-based approach to understanding ligand binding in steroid nuclear receptors. We also extend the knowledge of a hydrophobic binding cleft as a potential target for therapeutics. Finally, we characterize the predicted binding modes of five drug-like molecules across five steroid nuclear receptors.


INTRODUCTION
The role of the classical steroid nuclear receptors (SNR) in hormonerelated cancer is of increasing interest.Our current understanding, however, falls short; the picture is complex and at times appears conflicting.The roles of the various players by themselves and in relation to each other are just beginning to be revealed in cancer biology [1].
SNRs are members of the superfamily of nuclear receptors that function as transcription factors.The steroid receptor family includes the sex hormone receptors Estrogen Receptor- (ER-), Estrogen Receptor- (ER-), and Androgen Receptor (AR), as well the Progesterone Receptor (PR), Glucocorticoid Receptor (GR), and Mineralocorticoid Receptor (MR).The receptors have a wellcharacterized and structurally-conserved ligand binding domain with remarkable ligand specificity in most cases [22].Their classical mechanism of action is as a response to affect local biology at the cellular level in response to physiological states, which occurs through endogenous ligand binding in the ligand binding domain [9].Under normal physiological conditions, steroid receptors have functions in the growth and differentiation of tissue [20].
Significant evidence for the role of the ERs, AR, and PR in the growth and proliferation of cancer cells have been described [7].The intracrine environment of cancer is of increasing interest as evidence of more complicated relationships and cross-talk between the receptors are discovered [20].Therefore, understanding exogenous ligand binding is of high therapeutic interest.SNRs are distributed throughout many tissues, which complicates the search for drugs that do not have off-target effects [3].Further complications are added by the documented cross-talk between receptors and their differential gene regulation as homo-and heterodimers [5].
SNRs have sheltered hydrophobic clefts ideal for ligand binding [3].Ligand binding in the hydrophobic cleft on the opposite side of the receptor from the AF-2 site in SNRs is currently not well characterized.Ma and coworkers (discussed in detail in Related Work) found small molecule drugs that bound within the cleft on ER- [8].In this work, we assess the binding of five of the most promising candidates for binding at this site across SNR: NSC4631, NSC35446, NSC81747, NSC88999, and NSC372127.
We advance the understanding of exogenous ligand binding in SNRs.We carry out docking simulations and a novel structure-and energy-based approach to reveal potential bindings sites, poses, and conformations of interest.In particular, we extend the knowledge of the hydrophobic binding cleft as a potential target for therapeutics.Finally, we characterize the predicted binding modes of five drug-like molecules across all five SNRs.The rigorous simulation and analysis that we carry out in this paper is the first thorough computational study into ligand binding of SNRs.
The rest of this paper is organized as follows.The Related Work section provides some more details into our current understanding of SNRs and ligand binding.The Methods section describes the computational approach in this study, the results of which are related in the Results & Analysis section.The paper concludes with a discussion of our observations and future directions in the Conclusion section.

RELATED WORK
To understand SNRs, one must understand BRCA1, a tumor suppressor protein whose role is the maintenance of DNA integrity in concert with BARD1 [18].Mutations in BRCA1 were identified as the major cause of hereditary breast cancer by Miki et al. in 1994 [10].They are also associated with other cancers such as ovarian, uterine, prostate, and colon [23].A key idea that Rosen identified was that while the DNA repair hypothesis should ensure that BRCA1 mutations cause cancer in general, it is observed that they are highly involved with hormone-related cancers specifically [16].
In the late 90's, the association of BRCA1 with ER-, PR, and AR were characterized [16].BRCA1 has different roles with these receptors.In the case of ER and PR, BRCA1 suppresses the transactivation of the receptors, suppressing cell growth and proliferation.In contrast, BRCA1 enhances the transactivation of AR.The suppression of ER- activity is mediated by the AF-2 region on the ligand binding domain of ER- [2].BRCA1 binds AR in an analogous manner, however the transactivation of AR by BRCA1 is mediated by AF-1 in the N-terminal domain [13].The binding of BRCA1 and PR occurs outside of the ligand binding domain [6].Interestingly, the mechanism of action in PR does not involve AF-1 or AF-2, but a conformational change that prevents the binding of PR to the progesterone response element (PRE) [4].
In breast cancer, the effects of sex steroid receptor positivity are subject to contradictory stories, suggesting that it might be the ratio of AR/ER- that has an effect [1].GR is also of increasing interest as it has been suggested that GR upregulates BRCA1 [15].Additionally, like AR and ER, the loss of GR function is strongly linked to BRCA1related cancer [21].Research on the protectiveness of GR positivity and function in breast cancer has yielded conflicting results, further emphasizing the role of crosstalk and other relationships between the steroid nuclear receptors play in cancer biology [12].
MR has no reported association with BRCA1 as far as we are aware, but because it is highly related to GR, we include it in the analysis in this paper.Taken together, the available evidence suggests that the relationships between the SNRs and BRCA1 have explanatory power in the hormone-relatedness of BRCA1 cancers.
Ma and colleagues reported on small molecule drugs that disrupted the association of BRCA1 with ER- [8].We summarize their findings here.
Their analysis showed that the binding of the "BRCA1-mimetic" drugs was consistently in a hydrophobic cleft of ER- separate from the ligand binding site [7].This site is the putative location of direct ER--BRCA1 interaction responsible for the suppression of ER- activity [8].The name "BRCA1-mimetic" refers to this binding site and the ability of the molecules to downregulate ER- transcription in a BRCA1-like manner, an ability which is lost or reduced in BRCA1 mutant cancer.They assessed the location of the drugs by demonstrating that the mutation of predicted contacts affected their inhibitory potency [8].
This discovery represented a novel treatment strategy of SNRs in cancer because the newly identified molecules did not compete with ligands in the ligand binding pocket, in contrast to other treatments.Ma et al. demonstrated that the drugs suppressed tumor growth in multiple cell lines.Two of the reported drugs, NSC4631 and NSC35446, partially restored the sensitivity of tamoxifen-resistant cell lines to tamoxifen.In addition, Ma and colleagues raised the possibility of a conformational change that prevented the binding of activated ER- to the estrogen response element (ERE).Interestingly, the drugs showed little inhibitory activity in PR [7].
Nonetheless, the possibility of large-scale conformational change is of particular interest because it may have a similar effect in other SNR members.If so, it could represent an important therapeutic tool in steroid receptor drug development for cancer biology and beyond.
The current front line treatment for both AR+ prostate and ER+ breast cancer is anti-androgen and anti-estrogen treatment, respectively [7].However, if cancers are responsive to these treatments in the first place, they often develop resistance that leads to recurrence [8].This is another reason that the strategy of a conformational change would be of interest in the treatment of hormone-related cancer.Because of the high degree of structural conservation in the family, we reasoned that it would be beneficial to probe the binding of these drugs across the family to provide structural insights into the binding site for future use in therapeutic development.

Input Data
Crystal structures are utilized to represent the native state of the classical steroid receptors and are downloaded from the Protein Data Bank (PDB) from the following entries: PR -PDB ID 1A28; MR -PDB ID 2AA2; AR -PDB ID, 2Q7I; ER- -PDB ID 3ERT; ER- -PDB ID 3OLS; and GR -PDB ID 4P6X.PR, MR, AR, ER-, and GR structures are co-crystallized with their classical ligands: progesterone, aldosterone, testosterone, estrogen, and cortisol, respectively.ER- is co-crystallized with the selective estrogen receptor modulator (SERM) 4-hydroxytamoxifen.

Sequence Analysis
Our first investigation concerns comparing sequences of the SNRs in order to reveal any conservation for the purpose of identifying potentially-conserved binding sites.For this purpose, the receptor sequences (in .fastaformat) were obtained from the respective PDB entries listed above.Multiple sequence alignment (MSA) was performed using the Clustal Omega server [19] (https: //www.ebi.ac.uk/Tools/msa/clustalo/);Sequence homology was confirmed using the pairwise identity matrix.The resultant alignment was fed into pyMSAviz [17] and colored by identity.Markers were added at the positions of residues involved in BRCA-mimetic binding in ER- as obtained from docking simulations (see details in Section 3.7 Contact Computation and Analysis).ER- residues important to at least one BRCA1-mimetic were annotated with a purple dot.The consensus bar was also included in the visualization.Results of this analysis are related in Section 4 Results & Analysis.

Structural Analysis
In addition to sequences, we also compare structures of the receptors.Structural analysis was conducted by retrieving the .pdbfiles of crystal structures from the PDB.Water and other molecules were removed, leaving only the receptors and their respective co-crystalized ligands.Structure files were loaded into ChimeraX v1.6.1 [14], and the Matchmaker tool was employed to align all structures using ER- (PDB ID: 3ERT) as the reference.The output of the tool includes an optional sequence alignment, which was used to select residues at corresponding sequence alignment positions as those predicted to interact with the BRCA1-mimetics in ER-.
To compare the positions of docked BRCA1-mimetics across receptors, the top-scoring poses (see Section 4.2 Sequence and Structure Alignment Analysis for details) were written as .pdbfiles with CONECT records using AutoDockTools v1.5.6(ADT).The corresponding receptor and BRCA1-mimetic were loaded into ChimeraX and merged using the combine command.All receptors with a specific drug were aligned as in the structural analysis, then separated into individual models using the split command.The aligned drug molecules were then saved as .pdbfiles.

Blind Redocking Simulations for Parameter Selection
Blind re-docking simulations were employed to validate the simulation parameters.The crystal structures with co-crystallized ligands removed (otherwise prepared as described above) were docked with their corresponding ligands in AutoDock v4.2 (AD) [11] using AutoDockTools v.1.5.6.(ADT).Ligand structures were retrieved from PubChem, to prevent biased conformational sampling by using the crystal ligand structure as input.For each receptor, ADT was used to draw a grid box of maximum size (npts = (126, 126, 126) spaced at .375 Å) around the free receptor.The number of maximum energy evaluations was set to 1, 000, 000 for each simulation (as suggested by AD developers).The treatment of unbound ligand structures was set to extended because we did not see evidence of unexpectedly favorable energies upon binding in preliminary simulations.49 simulations were run for each receptor and ligand pair, yielding 490 bound structures.The full set of docked ligands were reclustered with a tolerance of 2.0 Å.The ligand pose and conformation with the lowest/best energy was selected for comparison to the ligand pose and conformation in the receptor-ligand crystal structure after verifying that it also belonged to the largest cluster (as well as visually inspecting for good fit to the crystal structure).

Redocking Simulations for Validation
The above re-docking simulations were also used as ground truth for our contact-based analysis related in Section 4.3 Contact Frequency and Energy Analysis.ADT was used to extract the close contacts between the receptor and ligand crystal structures, which are used as reference in our analysis of contacts in Section 4.

Blind Docking Simulations for Production and Observations
With the parameters identified as above, blind docking simulations were then run for all receptors using the BRCA1-mimetic molecules.However, in order to avoid docking within the ligand binding pocket, receptor structures were simulated with the ligand binding pocket occupied by the ligand.We found in preliminary simulations that the BRCA1-mimetics tended to result in top scoring structures in the ligand binding pocket, but with energies that indicated that they were not competitive with the binding of the co-crystallized ligands.This is further supported by work in [8], where the small molecules were not found to compete with ligand binding in ER-.Each of the five receptors was docked with each of the five BRCA1-mimetics resulting in 25 receptor-ligand pairs.For each receptor-ligand pair, 49 simulations were performed, resulting in a total of 1, 225 simulations and 12, 250 docked structures over all pairs.
3.6.1Comparison of production runs to wet lab results.As an additional check on validation, we compared the predicted close contacts for our NSC35446 simulations with those tested in [8] and found that our method identified all of the seven residues important to binding, two that were predicted to not affect binding, and two that were not tested.This discrepancy can be attributed to the different pose selection methods employed by our group and theirs.

Contact Computation and Analysis
Contacts were extracted from all crystal structures, redocking simulations, and BRCA1-mimetic docking simulations using ADT's Show Interactions function.The code was manually changed such that playing conformations in the conformation player automatically output the list of close contacts for that conformation to the console.In the case of the crystal structures, Show Interactions directly yielded the list of "true" close contacts found between the ligand and receptor, which were then utilized as a ground truth for redocking simulations.For all simulations, energies and ranking information was extracted from the reclustered data and combined with close contacts in a database for analysis.Contact frequency information is based on all close contacts that appear at least once over all simulations for a given ligand-receptor pair.
For each ligand-receptor pair, crosstabulation of contacts and energies was performed to asses the frequency of each contact in each energy bin.The crosstabulations were visualized as heatmaps for analysis.

RESULTS & ANALYSIS 4.1 Experimental Setup
We relate several analyses.First, utilizing sequence and structure alignment, we reveal that the BRCA1-mimetic binding site as identified in ER- exists across the classical steroid receptors.In one analysis, we provide the contacts of the ligated native structure as a reference, to validate the docking simulation parameters and serve as a point of comparison for analyzing the docking of drug-like molecules.Then we relate the results from the BRCA1-mimetic docking simulations, focusing on the receptor-ligand contacts.In another analysis, we provide both a structural and energetic view to differentiate between more and less energetically-stable contacts, concluding that there is a unique signature of good binding poses in contact-energy space.Details of these analyses can now be found below.

Sequence and Structure Alignment Analysis
The multiple sequence alignment (MSA) of all the receptors is shown in the top panel of Fig. 1.Inspection of this MSA shows overall low sequence identity among all members of the group; however the lowest pairwise identity reaches the standard threshold for homology at 20.25% (for the ER- to AR pair).The highest sequence identity is obtained between ER- and ER- at 59.17%.ER- residues identified as interacting with any of the 5 BRCA1mimetics are referred to as the binding site in this analysis.Among the binding site positions indicated in with purple dots in the top panel of Fig. 1, there are several positions with moderate to high sequence identity, with 7 of 11 positions being at least weakly similar according to the ClustalW schema.In addition, the residues at position 40 of the MSA are highly similar across most of the receptors, with the exception of the histidine in ER-.The small number of residues across all top scoring poses of the drug-like molecules and all receptors (details below) implies the existence of a single, co-located binding site.The sequence alignment demonstrates a consistency in sequence of predicted binding site residues, suggesting that the binding site is (evolutionarily) well-conserved across the steroid nuclear receptor subfamily.
The bottom panel of Fig. 1 shows the structural alignment of the receptors.The alignment is obtained by minimizing the global root-mean-squared-deviation (rmsd) of a target structure over a reference structure (ER-, in blue) via the ChimeraX Matchmaking tool [14].The alignment demonstrates the structural conservation of the nuclear receptor superfamily ligand binding domain fold.Taken together, information gleaned from sequence and structure alignment suggests that the BRCA1-mimetic binding site exists (and is preserved) across all classical steroid receptors.
We further investigate the homogeneity of the BRCA1-mimetic binding site across all receptors.As a representative example, the image shows the overlay of residues predicted to be in close contact with NSC4631 in ER-.The labels refer to the residues present in ER-.The figure makes clear that the binding site is spatially similar across all receptors.Thus it is clear that we can expect the BRCA1-mimetics to bind in the same location across all steroid nuclear receptors.
We observe significant structural diversity in BRCA1-mimetic binding poses across receptors, in the previously identified ER- binding site.An illustrative example is related visually on ER- in Fig. 2. In the figure, the top binding poses of NSC4631 taken from all five NSC4631-receptor simulations are shown.The structures overlap considerably.

Contact Frequency and Energy Analysis
Validation Simulations.A quantitative analysis is related in Fig. 3.When natural ligands are used as in the case of AR, GR, and PR, redocking simulations demonstrate a remarkable fidelity to the true binding sites extracted from the crystal structures as described in 3.7.In the figure, the blue arrows indicate the native close contacts as detected by AutoDock.Because natural ligands and binding sites are subject to co-evolution, we reason that true contacts will have high frequencies in the docking simulations.
Indeed, the high-frequency contacts represent the true binding site of the ligand, which is confirmed visually by overlaying the ligand and the crystal structure (data not shown).The high signal to noise ratio can also be seen in the residue frequency plots, where true contacts are likely to be among the highest frequency residues.This is in contrast to ER-, which is co-crystallized with tamoxifen, a selective estrogen receptor modulator (SERM) drug used in the treatment of breast cancer.In this case, most of the true contacts have frequencies below 20%.Nonetheless, AutoDock is able to find a top pose close to the crystallized tamoxifen structure.This analysis demonstrates that the frequency of contacts alone cannot reliably identify the binding site except for in the best case scenario of co-evolved ligands and receptors.
In the bottom panel of Fig. 3 we relate heatmaps of contact frequency binned by energy.As this panel shows, in the structures with natural ligands -AR, GR, and PR, the most frequent contacts are also the lowest energy bin, as expected.This is in contrast to ER-, where the most frequent contacts are at a higher energy than is realistic for a known competitive binder of the natural ligand.In addition to the frequency, the energy is banded across many energy levels for each contact.The bands indicate a less-funneled energy landscape where there is more flexibility within the site relative to native ligands in their binding pockets.
In particular, sequentially-clustered regions with bands across many energy levels indicate that the ligand is moving around in and near the binding site with better or worse energy defined primarily by the interactions within the binding site -van der waals, hydrogen bonding, and desolvation energy.
There are clear differences in the "contact profile" at the different energetic levels.The density of contacts at a certain energy level demonstrates the locations (for which contacts are a proxy) that correspond to a certain energy level.Consider that if we have a group of contacts indicated by density along the x axis, then we know that a larger set of different locations correspond to that energy bin.Note that we are only focusing on the energy bands here and not the frequencies.
At the best energies, possible contacts are sparsely populated; they correspond to the contacts in the best structures and indicate a good binding site that is closer to the co-evolved structure signature of consistent contacts at low energy.At middling energies, we see denser contacts (along x axis) and so secondary binding sites become visible.Differences between the primary and secondary binding sites inform us on which contacts and locations are favorable or not.In particular, contacts that are associated only with the top energies are likely to be realistic and important contacts that are prime candidates for testing in wet-laboratory experiments.On the other hand, contacts that are present only at lower energies may possibly indicate transient structures that can be down-prioritized for wet-laboratory experiments focused on therapeutic potential.
Analysis of Production Simulation Runs. Figure 4 shows the results from our contact energy analysis for all SNRs tested and for all BRCA1-mimetics docked.We see patterns most similar to the exogenous ligand tamoxifen in the ground truth simulation of ER-,  which suggests that our interpretation of the contact signatures of non-natural ligands is correct.Like the case in ER-, AutoDock is able to find a low energy conformation for all BRCA1-mimetics, which indicates that the drugs bind across all steroid nuclear receptors.Also similar to the ground truth simulations, we see a distinct contact signature in the lower energy binding bins of fewer contacts.

Sequence Alignment
An interesting exception to this observation is in the case of ER-.We found evidence that the BRCA1-mimetics interacted with tamoxifen in the ligand binding pocket.This result is interesting in light of the result that some of the BRCA1-mimetics rescued tamoxifen-resistance [8].The authors of the original paper, Ma et al., hypothesized that the drugs were able to resensitize cells to tamoxifen by means of a conformational change; our results suggest that the results may also be explained by means of direct interaction.
Of note among the figures is the unique signature of AR binding to these drugs.Although our methodology does not have the power to identify the cause, it appears that there may be an energy well associated with the hydrophobic cleft in AR.This is indicated by the low overlap between the two horizontal blocks of contacts along the x axis.In other words, there appear to be distinct binding sites with clearly separated energies.
The 4th row of 4 corresponds to the binding of NSC88999 across the receptors.With the possible exception of ER-, the contacts appear to be highly separated by sequence and associated with non-continuous energy levels.We do not have a sure explanation for this observation, however we believe that it is related to the shape of the molecule.NSC88999 has two bulky groups attached by a single bond, in contrast to the other BRCA1-mimetics, which have smaller groups and more rotatable bonds.

CONCLUSION
The work reported in this paper has demonstrated that the classical steroid receptors contain a generic nature of the BRCA1-mimetic binding site previously identified by Ma and coworkers [8].It also demonstrates a pattern of expected behavior of contacts in energetic space that can be used to understand the predicted binding sites of drugs in docking studies.
Utility of frequency versus/and energies.We report that frequencies alone cannot be relied upon to predict the contacts involved in ligand binding for drug studies, which is not clear a priori.This arises from the fact that the simulations explore conformational space in a biased manner.Instead, we propose the usage of energy and contacts together to understand the contacts made in lowenergy poses.In addition to the greater information available for a variety of low energy binding modes, we believe that the analysis of the structural information provided by such an analysis can be used to refine structure based drug design.Our method provides a list of candidate contacts that can be enhanced or mitigated by structural changes to the candidate drug molecules.In addition, we believe our method may assist in the accurate selection of binding poses from among top candidates without the in depth inspection of predicted poses currently recommended.This will be beneficial as drug docking software aims to become less knowledge intensive, as in the development of AutoDock Vina.
BRCA1 mimetic specific information.In our simulations we demonstrate that the BRCA1-mimetics are predicted to bind across all steroid nuclear receptors in the same location previously identified by Ma and coworkers [8], as we hypothesized.However, because these drugs were designed to interrupt the interaction of BRCA1 and ER-, it cannot simply be accepted that the drugs will have an effect.As far as we are aware, a direct interaction of BRCA1 and GR has not been hypothesized, and therefore it has not been characterized.This seems to us an important avenue to pursue given the emerging role of GR in hormonally-related cancers.Knowledge about MR and any potential MR-BRCA1 interaction is also lacking.
Finally, caution should be exercised when considering the predicted binding of drugs.Evidence of binding is not evidence of action.For example, even when BRCA1 binds to the site identified on ER, as is likely in the case of AR, transactivation happens through a second binding site in the N-terminal domain [23].Similarly, PR binds BRCA1 differently, so it would not disrupt the interaction, despite evidence of binding that we provide here.More research is required to asses the effects of binding at this location.
Future directions.There is much work to be done to further ascertain the usefulness of contacts as a strategy for assessing binding sites.For example, work in both the dry and wet lab settings should move in the direction of ascertaining the accuracy of such predictions compared to methods suggested by the authors of AutoDock.Assessments of potential GR-BRCA1 interactions should also be carried out in order to understand potential cross-target effects in using BRCA1-mimetic drugs.Future computational work should include the identification of molecules that differentially bind to members of the SNR family.Finally, the authors of the original work suggested that binding at this location triggered a conformational change.It is possible that this conformational change is a generic feature upon binding and it may be investigated for potential therapeutic use.

Figure 1 :
Figure 1: Top: ClustalW [19] MSA of nuclear sex hormone receptors, visualized with pyMSAviz [17].Purple dots are residues important to at least one drug as predicted for ER-.Residues are colored by identity.Bottom left: Structure alignment (global rmsd minimization-based) of all the receptors.Bottom right: The predicted ER- binding site for 4631 aligned with corresponding residues for the remaining receptors.The receptor labels reflect ER- residues.

Figure 2 :
Figure 2: All docked 4631 structures shown in the ER- receptor for reference.

Figure 3 :
Figure 3: Blue arrows show ground truth (native) contacts as detected from the crystal structure.The top panel relates the structure-/frequency-based analysis of contacts for each of the receptors.Bottom panel includes free energy.

Figure 4 : 6 ACKNOWLEDGMENTS
Figure 4: Log frequency contact heatmaps of all steroid nuclear receptors.Columns contain all BRCA1-mimetics docked for a given receptor.Rows represent the results of each drug scross all receptors.The order from top to bottom is: NSC4631, NSC35446, NSC81747, NSC88999, and NSC372127.