An Introduction to Molecular Modeling

Allen B. Richon

Introduction

Drug design is an iterative process which begins when a chemist identifies a compound that displays an interesting biological profile and ends when both the activity profile and the chemical synthesis of the new chemical entity are optimized. Traditional approaches to drug discovery rely on a step-wise synthesis and screening program for large numbers of compounds to optimize activity profiles. Over the past ten to twenty years, scientists have used computer models of new chemical entities to help define activity profiles, geometries and reactivities. This article introduces the concepts of molecular modeling and contains references for further reading.

One of the basic tenets of medicinal chemistry is that biological activity is dependent on the three-dimensional placement of specific functional groups (the pharmacophore). Over the past few years, advances in the development of new mathematical models which describe chemical phenomena and development of more intuitive program interfaces coupled with the availability of faster, smaller and affordable computer hardware have provided experimental scientists with a new set of computational tools. These tools are being successfully used, in conjunction with traditional research techniques, to examine the structural properties of existing compounds, develop and quantify a hypothesis which relates these properties to observed activity and utilize these "rules" to predict properties and activities for new chemical entities. The development of molecular modeling programs and their application in pharmaceutical research has been formalized as a field of study known as computer assisted drug design (CADD) or computer assisted molecular design (CAMD).

[Photo 1]

Photo 1
Ferrodoxin rendered from a Brookhaven PDB file
Iron complexed to sulphur containing residues highlighted

Computational chemistry/molecular modeling is the science (or art) of representing molecular structures numerically and simulating their behavior with the equations of quantum and classical physics. Computational chemistry programs allow scientists to generate and present molecular data including geometries (bond lengths, bond angles, torsion angles), energies (heat of formation, activation energy, etc.), electronic properties (moments, charges, ionization potential, electron affinity), spectroscopic properties (vibrational modes, chemical shifts) and bulk properties (volumes, surface areas, diffusion, viscosity, etc.). As with all models however, the chemist's intuition and training is necessary to interpret the results appropriately. Comparison to experimental data, where available, is also important to guide both laboratory and computational work.

The starting point for many computer assisted molecular design (CAMD) studies is generally a two dimensional drawing of a compound of interest. These diagrams can range from notebook or "back-of-the-envelope" sketches to electronically stored connection tables in which one defines the types of atoms in the molecule, their hybridization and how they are bonded to each other. Carbon dioxide, for example, would be defined as one SP² oxygen atom (atom number 1) bonded to an SP carbon atom (atom number 2) with a double bond which in turn, is bonded to a second SP² oxygen atom with a double bond. The connection table which defines carbon dioxide might be written as shown in Figure 1 (Note: this table was created using the TABLE command in Netscape 1.1).

atom #	Atom Name	Atom Type	Bound to Atoms
1	O	5	2
2	C	2	1, 3
3	O	5	2

Figure 1

Connection Table for Carbon Dioxide

Connection tables are easily stored and searched electronically. However, they must be transformed into three dimensional representations of chemical structure to study chemical properties. Chemists use the mathematical descriptions of the rules of physical chemistry which are contained in quantum mechanics and molecular mechanics to accomplish this task. This paper will provide a brief introduction to these methods. Detailed reviews are available from a number of sources in the open literature^[1-4].

Quantum mechanics is one of the oldest mathematical formalisms of theoretical chemistry. In its purest form, quantum theory uses well known physical constants such as the velocity of light , values for the masses and charges of nuclear particles and differential equations to directly calculate molecular properties and geometries. This formalism is referred to as ab initio (from first principles) quantum mechanics.

The equation from which molecular properties can be derived is the Schrodinger equation,

[Equation 1]

where E is energy of the system relative to one in which all atomic particles are separated to infinite distances, PSI is the wavefunction which defines the Cartesian and spin coordinates of the atomic particles and H is the Hamiltonian operator which includes terms for both potential and kinetic energy. Unfortunately, the Schrodinger equation can be solved only for very small molecules such as hydrogen and helium. Approximations must be introduced in order to extend the utility of the method to polyatomic systems.

The first approximation attempts to differentiate nuclei and electrons. It assumes that nuclei are much heavier than electrons and move much more slowly so that molecular systems can be viewed as electrons moving in a field of fixed nuclei (the Born-Oppenheimer approximation). Solutions to the Schrodinger equation using this assumption lead to values of effective electronic energy which are dependent on relative nuclear coordinates. As the nuclei are moved to new coordinates and molecular energies are re-calculated, a quantitative description of molecular energy is derived. This description, which relates energy to geometry, is referred to as the potential energy surface for the molecule. The lowest point on this surface, with respect to energy, is the ground state energy (and its associated geometry) for the molecule.

The second approximation allows the wavefunction PSI to be represented as the product of one-electron (or spin) orbitals. The functions that are used to describe these orbitals are referred to as basis functions. This formalism is referred to as the Linear Combination of Atomic Orbitals (LCAO) theory. Once the orbitals have been derived, the orbital coefficients (which define the energy of the system) are calculated. Hartree-Fock theory is used to accomplish this goal.

Hartree-Fock assumes that the energy of a set of molecular orbitals can be derived from the basis set functions which are used to define each orbital and a set of adjustable coefficients which are used to minimize the energy of the system. The energy calculation becomes an exercise in solving a set of (N X N) matrices to obtain optimal values for the orbital coefficients. Since this calculation requires a value for the coefficients in order to solve the equations, an iterative process is used in which an initial guess for the value of the coefficients is progressively refined until it provides consistent values. This method is referred to as the self-consistent-field (SCF) theory.

Quantum mechanics utilizes a set of mathematical descriptions to define a theoretical model for the behavior of molecules. The validity of these models can be gauged by comparing structures and properties derived from the model with experimental results. In general, ab initio methods are able to reproduce laboratory measurements for properties such as the heat of formation, ionization potential, UV/Visible spectra and molecular geometry.

Since quantum methods utilize the principles of particle physics to examine structure as a function of electron distribution, their use can be extended to the analysis of molecules as yet unsynthesized and chemical species which are difficult (or impossible) to isolate. Geometries and properties for transition states (where the electronic character of component atoms is shifting from that found in the starting material to that of the products) and excited states (where the electronic configuration of the molecule is temporarily perturbed by adding energy to the molecule) can only be calculated using quantum methods.

Ab initio quantum methods compute a number of solutions to a large number of equations. While recent publications have reported calculations on large molecules^[5], the methods are generally limited to compounds containing between ten and twenty atoms due to the amount of computer time required for each calculation and the large amount of disk space needed to store intermediate data files. Physical/theoretical chemists have developed alternative approaches to computing structures and properties by simplifying portions of the calculation to circumvent these limitations. These methods are referred to collectively as semiempirical quantum methods.

Semiempirical methods utilize approaches which are similar to ab initio methods, but several approximations are introduced to simplify the calculations. Rather than performing a full analysis on all electrons within the molecule, some electron interactions are ignored. These methods include the Huckel approach for aromatic compounds (in which the outer electrons in conjugated systems are treated, but the inner (or core) electrons are ignored) and the Neglect of Differential Overlap formalisms found in the CNDO (Complete Neglect of Differential Overlap) and INDO (Intermediate Neglect of Differential Overlap). In these methods, the more complex portions of the ab initio calculation are ignored or set to zero. Other semiempirical approaches replace complex portions of the calculation with parameters which are derived from experimental data.

While semiempirical methods require less computer resources than ab initio methods, they are still compute intensive. In general, calculations are routinely performed on compounds which contain up to 100 atoms. The chief drawback of the method is that its application is limited to systems for which appropriate parameters have been developed.

Application of quantum techniques in drug design are detailed in many of the primary journals. Recently, the computational group at Sandoz has reported the use of Gaussian-90 (an ab initio program) in the analysis of the electronic and tautomeric factors for a model of H2 receptor agonists^[6]. Scientists at The University of San Luis have examined conformational energy maps and electrostatic potentials derived from semiempirical calculations to differentiate between H2 receptor antagonists and H3 receptor agonists^[7].

Quantum methods can also be used to examine the forces which determine protein and peptide stability. Sligar and Robinson have reported preparation of the first system which permits measurement of electrostatic effects on the stabilization of helical proteins^[8]. D.R. Ripoll and co-workers have utilized calculations from DELPHI (quantum calculations based on density functional theory) to develop a proposed mechanism for capture of substrate molecules by acetylcholinesterase^[9]. Workers at the University of Nancy have used high-level quantum calculations to define six new conformations for peptides. All of the proposed conformations were found in proteins whose structures were solved using X-ray crystallography^[10].

Experimental chemists routinely work with compounds which range in size from several hundred atoms (drug candidates, monomers, agricultural chemicals, etc.) to several thousand atoms (proteins, nucleic acids, carbohydrates, polymers, etc.). The computational requirements for quantum mechanical approaches render these methods unusable for routine analysis of these types of compounds. Thus, a further simplification in the way molecular geometries and their associated properties are computed is required. This approach is the molecular mechanics or force field method.

Molecular mechanics is a mathematical formalism which attempts to reproduce molecular geometries, energies and other features by adjusting bond lengths, bond angles and torsion angles to equilibrium values that are dependent on the hybridization of an atom and its bonding scheme (this atom description is referred to as the atom type). Rather than utilizing quantum physics, the method relies on the laws of classical Newtonian physics and experimentally derived parameters to calculate geometry as a function of steric energy. The general form of the force field equation is

E_pot is the total steric energy which is defined as the difference in energy between a real molecule and an ideal molecule. E_bnd, the energy resulting from deforming a bond length from its natural value, is calculated using Hooke's equation for the deformation of a spring (E = 1/2 K_b(b - b_o)² where K_b is the force constant for the bond, b_o is the equilibrium bond length and b is the current bond length). E_ang, the energy resulting from deforming a bond angle from its natural value, is also calculated from Hooke's Law. E_tor is the energy which results from deforming the torsion or dihedral angle. E_oop is the out-of-plane bending component of the steric energy. E_nb is the energy arising from non-bonded interactions and E_el is the energy arising from coulombic forces.

When the terms shown in the general form of the force field are expanded, the equation becomes

[Equation 3]

The manner in which these terms are utilized to build a model is referred to as the functional form of the force field. The force constants

[Equation 4]

and equilibrium values

[Equation 5]

are atomic parameters which are experimentally derived from X-ray, NMR, IR, microwave, Raman spectroscopy and ab initio calculations on a given class of molecules (alkanes, alcohols, etc). The energy of the atoms in a molecule is calculated and minimized using a variety of directional derivative techniques.

In contrast to ab initio methods, molecular mechanics is used to compute molecular properties which do not depend on electronic effects. These include geometry, rotational barriers, vibrational spectra, heats of formation and the relative stability of conformers. Since the calculations are fast and efficient, molecular mechanics can be used to examine systems containing thousands of atoms. However, unlike ab initiomethods, molecular mechanics relies on experimentally derived parameters so that calculations on new molecular structures may be misleading.

Each of the methods described above are used to calculate the energy of a compound in a specific 3D orientation and to optimize the geometry as a function of energy (i.e., adjust the coordinates of each of the atoms and recompute the energy of the molecule until a minimum energy is obtained). Coupled with other numerical techniques, they also can be used to simulate the time-dependent behavior of molecules (molecular dynamics) and explore their conformational flexibility (conformational search).

Molecular dynamics combines energy calculations from force field methodology with the laws of Newtonian (as opposed to quantum) mechanics. The simulation is performed by numerically integrating Newton's equations of motion over small time steps (usually 10^-15 sec or 1 fsec). The simulation is initialized by providing the location and assigning a force vector for each atom in the molecule. The acceleration of each atom is then calculated from the equation a = F/m where m is the mass of the atom and F the negative gradient of the potential energy function (the mathematical description of the potential energy surface). The Verlet algorithm is used to compute the velocities of the atoms from the forces and atom locations. Once the velocities are computed, new atom locations and the temperature of the assembly can be calculated. These values then are used to calculate trajectories, or time dependent locations, for each atom. Over a period of time, these values can be stored on disk and played back after the simulation has completed to produce a "movie" of the dynamic nature of the molecule.

Molecular dynamics simulations have been used in a variety of biomolecular applications. The technique, when combined with data derived from Nuclear Magnetic Resonance (NMR) studies, has been used to derive 3D structures for peptides and small proteins in cases where X-ray crystallography was not practical. Additionally, structural, dynamic and thermodynamic data from molecular dynamics has provided insights into the structure-function relationships, binding affinities, mobility and stability of proteins, nucleic acids and other macromolecules that cannot be obtained from static models.

Paulsen and Ornstein used dynamics to explain substrate orientation and the product profile for the oxidation of thiocamphor bound to cytochrome P450^[11]. Karplus and Gelin demonstrated that conformational changes are required in hemoglobin to permit oxygen transport and discussed potential pathways for oxygen migration into the active site^[12]. This study is of interest because static models of hemoglobin, some of which are based on X-ray data, do not show sufficient entry space to permit oxygen molecules to diffuse into the active site. Dynamic motion of hinge regions in the protein is necessary to account for activity.

While molecular dynamics provides an excellent approach to searching regions of conformational space, it is not an exhaustive search. The active conformation of a molecule can be missed as the dynamics simulation skips over the hills and valleys of the potential energy surface. Since the active conformation at a receptor may not always be the minimum energy structure (defined as the structure with the 3D geometry that places the molecule at the lowest point on the potential energy hypersurface), it is important to examine all potentially accessible conformations.

For small molecules with a limited number of freely rotating bonds, this can be easily accomplished by driving each torsion angle stepwise over a 360 degree range. As an example, a graph of the conformationally dependent energy (shown along the Y-axis) of the molecule butane is shown in Figure 2.

Figure 2
Butane Conformers

Unfortunately, the number of conformations for a molecule (defined as the "non-identical arrangements of the atoms in a molecule obtainable by rotation about one or more single bonds" ^[13] ) generated in this manner rises exponentially with the number of bonds rotated. The theoretical number of conformations for a molecule can be calculated by the formula.

Number of conformers = (360/angle increment)^{(# rotatable bonds)}

Thus a molecule with four rotatable bonds searched in a 60 degree increment will generate (360/60)⁴ or 1,296 distinct conformations. If the energy for each structure is evaluated in one second, the search and evaluation will require approximately 22 minutes. Timing of this order of magnitude is acceptable for small and medium-sized molecules. The approach is not reasonable for proteins and peptides which contain hundreds or thousands of rotatable bonds. Conformational behavior for these types of molecules is examined using random searching techniques such as molecular dynamics, Monte Carlo and distance geometry. Andrew Leach has recently published a detailed review of these and other search methods^[14].

The remainder of this paper will explore three topics: systematic search and its application to drug discovery (the Active Analog Approach), the application of new algorithmic approaches to conformation searching (genetic algorithms) and display techniques for chemical properties (molecular surfaces).

If the size of a molecule limits exhaustive searching approaches, one can redefine the problem by modifying the algorithm, examining a subset of the structure or introducing screens to limit the number of conformations produced by the search. Marshall and co-workers utilize the latter approach to reduce the combinatorial problems of searching large sets of structures^[15]. Their method examines each conformation to assure that its steric energy is within a user specified range, that the conformer does not have any two atoms closer together than the sum of their van der Waals radii and that it obeys user defined distance ranges for specified atoms. In addition, the method uses logical operators on the maps of distances and angles, which were calculated from searches on previous molecules in the series, to constrain searches on new members of the series. When this method is applied to compounds which are active in a given biological assay, the final map of distances defines the 3D orientations of atoms in the molecule which are responsible for activity. This technique has been widely used in medicinal research. A recent report detailed the successful development of a receptor model based on the analysis of 28 angiotensin converting enzyme (ACE) inhibitors^[16].

One of the problems which is common to many of the methods described is the need to define the spatial regions where molecules can orient properly in a given active site. The algorithms used in energy minimization, conformational searching and pharmacophore identification all attempt to find optimal solutions to problems which have several potential solutions. Examples of this issue include the multiple-minima problem in molecular mechanics optimizations and combinatorial problems seen in conformational search. Computational chemistry software developers have recently started to investigate the use of genetic algorithms as an approach to circumvent these problems^[17].

Genetic algorithms attempt to use the rules of natural selection to subset computationally demanding tasks. Rather than run the problem as a linear progression or allocate potential solutions into tree structures, the algorithm randomly builds sets of solutions which are governed by an ongoing process of natural selection. For chemistry applications, the algorithm operates on strings of binary numbers which represent molecular composition, position and/or conformation (defined as the individuals). When individuals are acted upon by genetic operators such as crossover, mutation and selection factors, families of individuals evolve over a period of time. These families then can be prioritized by constraints which define optimal solutions to the problem.

Genetic algorithms have been applied to a variety of chemical problems including NMR studies of proteins and peptides (optimization of distance constraints), conformational search problems (systematic searching of conformational space for large structures) and optimization of the overlap of common pharmacophores in structurally diverse molecules. In each case, the algorithms significantly expanded either the scope of the problem under investigation or increased the speed and efficiency of the process relative to traditional algorithms.

Payne and Glen recently published work exploring the application of genetic algorithms to problems in molecular similarity, conformational search and pharmacophore development^[18]. The paper discusses development of the algorithm and its application to a variety of CAMD problems including fitting novel NMDA (N-methyl-D-asparate) receptor antagonists to the putative pharmacophore using distance constraints, fitting peptides to distance constraints derived from NMR and fitting two molecules with radically different structure, but similar activity.

The computational methods discussed in this paper are used to optimize molecular geometry and calculate physical and electronic properties. An equally important aspect of CAMD/CADD is the ability to display these properties in a manner which increases the chemist's ability to interpret experimental findings and correlate these finding with structural features. Molecular surfaces play an important role in these studies.

Drug discovery is a more complex problem than it was in the past due, in part, to the fact that the etiology of the diseases which we seek to control are more complex. The amount of data generated in a typical study can easily overwhelm the scientists who are responsible for guiding the study. Computer systems which store, manipulate and display chemical structures and their associated data have therefore become an important and growing tool in the research process^[19]. The continual improvement in algorithm design and incorporation of new mathematical models for chemical simulation promises further advances in this field.

REFERENCES

Ab Initio Molecular Orbital Theory, Warren J. Hehre, Leo Radom, Paul v.R. Schleyer and John A. Pople, John Wiley & Sons, New York, 1986.
Semiempirical Molecular Orbital Methods, J.J.P. Stewart, in Reviews in Computational Chemistry, VCH Publishers, 1990, vol. 1, Kenny B. Lipkowitz and Donald B. Boyd (editors), pages 45-81.
Molecular Mechanics, Norman L. Allinger and U. Burkert, ACS Monograph 177, Washington, DC, American Chemical Society, 1982.
Computer Simulation of Biomolecular Systems Using Molecular Dynamics and Free Energy Perturbation Methods, Terry P. Lybrand, in Reviews in Computational Chemistry, VCH Publishers, 1990, vol. 1, Kenny B. Lipkowitz and Donald B. Boyd (editors), pages 295-320.
Ab Initio Calculations on Large Molecules: Methodology and Applications, Jerzy Cioslowski, inReviews in Computational Chemistry, VCH Publishers, 1993, vol. 4, Kenny B. Lipkowitz and Donald B. Boyd (editors), pages 1-33.
The Computational Design of Amidines as H2-Receptor Agonists of Histamine, M. Sabio and S. Topiol, J. Mol. Struct. (Theo Chem), 279, 15-27, 1993.
Theoretical Study of Selective H3 Receptor Agonists of Histamine, R.D. Enriz, M.R. Estrada, E.A. Jauregui, C.A. Ponce and F. Tomas-Vert, J. Mol. Struct. (Theo Chem), 280, 5-15, 1993.
Electrostatic Stabilization in Four-Helix Bundle Proteins, C.R. Robinson and S.G. Sligar, Protein Science 2, 826-837, 1993.
An Electrostatic Mechanism for Substrate Guidance Down the Aromatic Gorge of Acetylcholinesterase, D.R. Ripoll, C.H. Faerman, P.H. Axelsen, I. Silman and J.L. Sussman, Proc. Nat. Acad. Science, 90, 5128-5132, 1993.
Peptide Models 6. New b-turn Conformations from ab initio Calculations Confirmed by X-ray data of Proteins, A. Perczel, M.A. McAllister, P. Csarzar and I.G. Csizmadia, J. Amer. Chem. Soc.,115, 4849-4858 1993.
Substrate Mobility in the Thiocamphor-bound cytochrome P450: An explanation of the Conflict Between the Observed Product Profile and the X-ray Structure, M.D. Paulsen and R.L. Ornstein,Protein Eng., 6 , 359-365, 1993.
Bruce R. Gelin and Martin Karplus, PNAS, 74, 801-805, 1977.
Conformational Analysis, Ernest Eliel, et. al., Wiley Interscience, 1965.
A Survey of Methods for Searching the Conformational Space of Small and Medium-Sized Molecules, Jerzy Cioslowski, in Reviews in Computational Chemistry, VCH Publishers, 1991, vol. 2, Kenny B. Lipkowitz and Donald B. Boyd (editors), pages 1-55.
G.R. Marshall, C.D. Barry, H.E. Bosshard, R.A. Dammkoehler and D.A. Dunn in Computer Assisted Drug Design, ACS Symposium Series 112, E.C. Olson and R.E. Christofferson, eds., American Chemical Society, Washington DC, 1979.
Constrained Search of Conformational Hypersurface, R.A. Dammkoehler, S.F. Karasek, E.F.B. Shands and G.R. Marshall, J. Comput-Aided Mol. Design, 3, 3, 1989.
Genetic Algorithms: Principles of Natural Selection Applied to Computation, Stephanie Forrest,Science, 261, 872-878, 1993.
Molecular recognition using a binary genetic search algorithm, A.W.R. Payne and R.C. Glen, J. Mole. Graphics, 11, 74-91, 1993.
Molecular Modeling: Altering How Chemistry is Done, James Krieger, Chem. Eng. News, September 23, 1991.

Drug Desiginig Using Modern Techniques In Computational Chemistry

Blog Archive

July 29, 2009