Namd Tutorial¶

Introduction and Basics¶

NAMD and alternatives¶

Nanoscale Molecular Dynamics/Not Another Molecular Dynamics program (NAMD) is a molecular dynamics (MD) package that is particularly good at scaling with significant amount of computational resources, making it most useful for very large molecules. NAMD is designed with biomolecules in mind, and the force fields available for use in NAMD have large libraries of parameters for a variety of biomolecules. One advantage of NAMD is easy access to a CUDA-enabled version, allowing the use of GPUs to dramatically accelerate the rate of molecular dynamics simulations – the scalability with GPUs is such that simulations with over 240 million individual atoms were performed with good performance on over 16,000 nodes. It’s also free, well-documented, and has support for a lot of relatively niche MD techniques that may not be as well-supported in other programs. It’s certainly not the only choice in biomolecule MD - OpenMM is fast and incredibly flexible, Amber is a well-supported ecosystem (though it is not free), and GROMACS is likely the single most popular choice and so is well-supported. However, the combination of efficient scaling, good GPU performance, being designed for the CHARMM family of force fields I most commonly used, and having exclusive access to some tools like gaussian-accelerated MD, NAMD is a program I still use regularly.

Force Fields¶

The NAMD MD package supports several possible force fields – the set of parameters that control how atoms interact in MD. The primary force fields available in NAMD are CHARMM, X-PLOR (a modification of CHARMM designed to facilitate interaction with experimental data), GROMOS, and Amber. Of these, CHARMM is the most commonly used with NAMD and the clear focus of the package - if you’re using other force fields, I’d consider using GROMACS, Amber, or OpenMM instead. This tutorial will assume the use of CHARMM parameters – but the process is similar for any of the above. If you’re interested in further discussion about these force fields, I’ve written a separate article discussing the differences between the commonly-used biomolecular MD force fields.

External Tools¶

Before any simulation can be run, NAMD, your chosen force field, your starting structure, and perhaps VMD (Visual Molecular Dynamics; a program to visualize and analyse MD results) must all be available to you. NAMD and VMD can be downloaded from the Theoretical and Computational Biophysics Group of the University of Illinois free of charge, currently at the following addresses: NAMD and VMD. The CHARMM force field has a single current force field: CHARMM36m, which is the de-facto default force field for the NAMD program.

The starting structures used in simulations are obtained from several potential sources – the source will vary depending on the structures being studied. The Protein Data Bank is one of the most common - it contains crystal structures and NMR solution structures of every publicly available protein that has been crystallised, as well as many DNA and RNA structures. Structures of lipids and carbohydrates are more often procedurally generated or created in programs like Avogadro and converted to the .pdb format. Structures being input into NAMD are typically in the .pdb file format. A helpful site for the generation of inputs into NAMD is the CHARMM-GUI website – it is capable of skipping the setup phase for many calculations. It also has useful tools for aspects of MD simulations that are more difficult to do manually - such as the generation of lipid bilayers, glycoproteins, or polarizable MD simulations. Although it is possible to use this to skip a good portion of the initial setup required for MD simulations, I would recommend avoiding this while learning - it is important to learn how to setup an MD simulation so that you can deal with the issues that will arise when things go wrong, and so it is best to have some experience doing these tasks before automating them.

Running a Simulation - Generalised Workflow¶

Set-up and system topology generation¶

Our first step in running a simulation is to set up the simulation for use in our MD package, and the second is to get the simulation started; we’ll be running through both of these steps for an example. Before we start this process, ensure you have access to VMD, NAMD, and your chosen force field (CHARMM36 in the case of the worked example later). The links to the download of these programs and packages are available above, and installation instructions are included (and vary depending on the machine you are installing it on). Once you have ensured these programs are installed and available, the next step is to get access to the starting structure in a .pdb format; this format contains information on the location of atoms, as well as the element of the atom and the chain of an atom (with each chain being a series of covalently-bonded monomers). This is of great importance, but more is required – in particular, the connectivity of the atoms is also required. A Protein Structure File (.psf) can be generated from the positioning of the atoms (the .pdb file you start with) and a type of parameter known as the topology that details the interconnectivity of the parameterized species. The program that does this is bundled with NAMD (and VMD) and is named psfgen. It requires an input file that details the location of your topology files, the .pdb file, and lists each chain in the input .pdb file. Further modifications can be done at this point if required – modifications to the final state (known as patches) can be applied that change partial charges, connectivity, and even delete or add atoms. A common example of this is the addition of a disulphide bridge – deleting the hydrogen atoms attached to two (spatially adjacent) thiol groups and forming a bond between the two, or the conversion of the first and last residues of each chain to capped residues – which is done automatically for recognised amino acids.

Solvation and Ionisation¶

With the .psf file generated, a complete description of the system is available – the location of atoms is stored in the .pdb file, and how they are connected is stored in the .psf file. The next stop in almost all calculations is the solvation and ionisation of the biomolecule you have represented. This can easily be done using VMD in one of two ways; an extension is in-built to allow solvation using a GUI, or a VMD script can be written in the TCL scripting language that solvates and ionizes a molecule. Ionisation can be done with a variety of cations – sodium, magnesium, potassium, caesium, calcium, and zinc – but chloride is the only available anion. Ionisation can be used to neutralize the system (which is a requisite of most modern MD methods), or to neutralize the system and continue adding salt until a set salt concentration is reached. In both cases, water is the default solvent, but any other solvent can be used provided you have access to an equilibrated box of the solvent from which to randomly sample. This box defaults to a rectangular or square box – the reason for which is discussed in Extra Information.

MD Configuration¶

Once you have solvated and ionized your system, it is essentially ready for simulation to begin. A visual inspection (using VMD) is recommended to ensure nothing abnormal has occurred up until this point, but once this is completed a configuration file for the upcoming simulation is all that is needed. Configuration files detail what is to actually occur in a simulation – the temperature at which it occurs, how long each timestep is, how often it will write its progress, what parameters to be using, how to handle non-bonded forces, and so on. A full description of available options in a configuration file is available in the NAMD User’s Guide, but a set of sample configuration files (for different types of simulations) are supplied alongside this tutorial. In addition, a table in Extra Information details many of the commonly used options in a configuration file. These files tend to be long, but can easily be reused between similar simulations – my advice would be to make a fairly detailed configuration file initially, and then modify it from there for each individual simulation.

Actually running the simulation¶

With a configuration file completed, you’re ready to start a simulation – you simply need to run NAMD with the configuration file as the argument. But most of your jobs aren’t going to be run locally – they’ll be run on supercomputers, where there are some extra steps that need to be resolved. MD scales well with increased core count, particularly when using NAMD, but does require a large system to scale effectively; the simulation box is cut into patches that are roughly the size of the long-range interaction cutoff, and each patch is split into (roughly 14) compute objects. Each compute object can be assigned to a separate processor – and so the larger your system, the more patches and more compute objects, and so the more parallelizable your system is. NAMD runs benchmarking automatically at the start of each simulation; if you are interested in the optimal number of cores to apply to a simulation, you can run 10-minute jobs and compare the benchmarking. NAMD doesn’t rely on an MPI framework for the underlying parallelisation, and so any number of cores can work – for a simulation under 50,000 atoms, I would not normally request more than a single node.

The amount of cores used is impacted by the availability of GPUs on the supercomputer being used – NAMD simulations can be dramatically accelerated by the use of CUDA GPUs (almost any modern NVIDIA GPU); for example, a 11,000 atom simulation of mine was shortened from 12 days to 2 days when adding 2 P100 NVIDIA GPUs. When submitting job files that run NAMD, no extra command is required to handle GPU support (provided you are running a CUDA-compiled version of NAMD), as the GPUs are automatically recognized.

The final bit of more general advice before moving on to an example is about restart files; upon successfully completing a simulation, NAMD will output a .coor file (equivalent to a .pdb; it contains the coordinates of the system), a .vel file (containing the velocities of all atoms in the system), and finally a .xsc file (containing records of the periodic cell) in a binary format (by default); if the simulation crashes or times out, the latest version of these restart files will be present in the output folder (with the .restart suffix). These can be used to resume simulation, and simply require the next simulation to use the coordinates and velocities commands instead of the coordinates and temperature commands. For equilibration simulations - where one might run an NVT simulation, followed by an NPT simulation, followed by a simulation in which restraints are relaxed - I would recommend submitting these short simulations together as one job, with each configuration file using the end point of the previous one.

Running a Simulation - Worked Example¶

Getting Started¶

Alongside this tutorial, a .zip file is available that contains all the files needed to run a simulation of insulin. The aim of this section is for you to be able to create all the files contained in this folder without having to download it, but it should be a useful resource to compare and troubleshoot with no matter how it is used. However, before we can get into the first step, we need to make sure we have NAMD installed and the CHARMM36 parameters available. The installation of NAMD is fairly simple – download the folder containing the executable, and make sure your path has access to it. I have a ‘namd2’ and ‘psfgen’ file in my /usr/local/bin directory that points toward the NAMD/psfgen executable in my Applications folder, but there are other ways you could do it - you could put those executables directly in your bin, or if you’re just testing out and are unsure what I’m saying here, you could give a path to the namd2 executable each time - ~/Desktop/namd/namd2 configurationFile.conf would work if you just put the namd folder in your desktop. Once you’ve done this for NAMD, repeat for VMD.

Installation of the CHARMM36 parameters is even easier – they simply need to be downloaded; when you make your psfgen and configuration files, you provide a path to these. In the bundled example files, the parameters are provided in the same folder. If you want to get them yourself, they’re available on the CHARMM webpage. For the sake of ease, you can these parameters into the same directory you’re working in, but when simulations are performed on the HPC facilities, you should have one folder that contains your parameters - that way you don’t duplicate them unnecessarily. With these programs installed, you now need to obtain the starting geometry – we’ll be using the ‘3I3Z’ crystal structure from the Protein Data Bank, which is accessible by the provided hyperlink, or by searching 3I3Z on the PDB website. Download the file in the .pdb format, and then we’re ready to get started!

!alt text

Creating System Topology¶

The first step in our process is to run the psfgen command to create a file with the bonding arrangement of our protein. Psfgen is a useful program, but it is also quite particular about what is required - and sometimes these requirements aren’t entirely clear. The easiest way to demonstrate how to run the program is with an example – and so the script used on the insulin molecule is produced below (with sections added in the in comments for ease of reference, as will all scripts shown in this tutorial):

# [1]
package require psfgen                  

# [2]
topology top_all36_prot.rtf             
topology toppar_water_ions_namd.str

# [3]
pdbalias atom ILE CD1 CD                
pdbalias residue HIS HSE

# [4]
segment A {                             
pdb 3i3z_prepared_a.pdb
}

segment B {                         
pdb 3i3z_prepared_b.pdb
}

# [5]
patch DISU A:6 A:11                     
patch DISU A:7 B:7
patch DISU A:20 B:19

# [6]
coordpdb 3i3z_prepared.pdb              

# [7]
guesscoord                              

# [8]
writepsf 3i3z_psfgen.psf                
writepdb 3i3z_psfgen.pdb

The start of the file (section [1]) simply ensures that psfgen is being used, as these scripts can be run through VMD as well as the psfgen program. Section [2] simply provides the path to the topology files required (the protein and water topologies respectively). Section [3] is a little more complicated – changes to the .pdb file are required, as there are some differences between the .pdb files used by crystallographers and those used in MD. The histidine amino acid can be in one of two protonation states – both are called HIS in crystal structures, as the hydrogen atom locations are rarely resolved experimentally. As the neutrally-charged histidine could be in one of two states, there are two different residues that represent it - HSE and HSD. There are a variety of ways to determine which should be chosen - visual inspection, web-servers exist to make reasonable guesses, and there are even specific MD techniques that can determine the most appropriate for each protonatable site, if the proton location is going to be critical to the simulation’s accuracy. The two lines in section [3] change the names of atoms or residues in .pdb files to those used by CHARMM – and similar lines will likely be required in all MD simulations of proteins you run. Section [4] details the various segments of the molecule - as the molecule is processed on a chain-by-chain basis, each chain in the .pdb file will require a different segment in the psfgen file. Inside of the segment you must provide a .pdb file for that chain (a small VMD script is supplied, splitABScript.tcl, that will generate these automatically while removing crystallographic water), and specific changes can be given – changing specific atom names, changing the capping groups of the chain, and so on. The script to split the insulin protein can be called using the following command:

vmd -dispdev text – e splitABScript.tcl

Section [5] details any changes you want to apply to the system that are not specific to a single chain – the examples listed are forming disulfide bonds between non-sequential residues in the chain (A6-A11, A7-B7, and A20-B19). If you’re interested in the changes available, a list of all available changes here is found by searching for PRES in the topology files loaded at the top of the script. Moving towards the end of the file, section [6] details the .pdb file containing the atoms of the entire system, and section [7] instructs psfgen to guess the location of any atoms that aren’t present in that .pdb (for example, in the .pdb file being used, hydrogen atoms are omitted). If you’ve made major changes to the atoms with these patches - for example, changed the element of an atom, or substantially altered the chemical environment - it’s best to tell psfgen to re-fetch the appropriate parameters for the new environment with this line regenerate angles dihedrals, but this was not needed in this case. Finally, section [8] details the output – writing the new .psf and .pdb files. Once you’re happy with the way the script works, you can run it using the psfgen executable:

psfgen psfgen.pgn > psfgen.log

If the log isn’t showing errors, I’d advise loading up the resulting structure in VMD and inspecting it to make sure it looks as expected:

vmd 3i3z_psfgen.psf 3i3z_psfgen.pdb

Solvation and Ionization¶

If there aren’t any obvious errors (and psfgen errors are typically obvious – bonds far too long, or more commonly a cluster of atoms with 0,0,0 x,y,z coordinates), you’re now safe to move on to the next step. VMD is used to solvate and then ionize the protein – this can be done inside of VMD (through the Extensions -> Modelling -> Add Solvation Box/Ions menu) or via a script. The use of the modelling menu opens a small window – most options are fairly self-explanatory, with the only changes needing to be made in the solvation menu being the addition of 12 Angstroms of box-padding in all directions (ensuring there is a 12 Angstrom distance from any part of the protein to the edge of the solvation box), and for ionization you simply need to choose the option to neutralize and add salt to a concentration of 0.2 M (mimicking experimental conditions, in this case). Doing this instead by script is simple (and I would recommend it – simply because it’s consistent and you can easily check what you did):

# [1]
package require solvate                     
package require autoionize

# [2]
set inputName 3i3z                          
set psfgenTerm _psfgen
set psfgenName $inputName$psfgenTerm
set solvateTerm _solvated
set solvateName $inputName$solvateTerm
set ionizedTerm _ionized
set ionizedName $inputName$ionizedTerm

# [3]
solvate $psfgenName.psf $psfgenName.pdb -t 12 -o $solvateName 

# [4]
# Need to neutralize as PWE electrostatics require a neutral system, so properly done PBC needs a neutral system
autoionize -psf $solvateName.psf -pdb $solvateName.pdb -sc 0.2 -cation SOD -anion CLA -o $ionizedName                               

# [5]
set backbone [atomselect top "backbone"]
set all [atomselect top all]
$all set beta 0
$backbone set beta 1
$all writepdb 3i3z_ionized.pdb

# [6]

proc get_cell {{molid top}} {
 set file [open boxSize.rtf w]
 set all [atomselect $molid all]
 set minmax [measure minmax $all]
 set vec [vecsub [lindex $minmax 1] [lindex $minmax 0]]
 puts $file "cellBasisVector1 [lindex $vec 0] 0 0"
 puts $file "cellBasisVector2 0 [lindex $vec 1] 0"
 puts $file "cellBasisVector3 0 0 [lindex $vec 2]"
 set center [measure center $all]
 puts $file "cellOrigin $center"
 $all delete
 close $file
}

get_cell

exit

Both sections [1] and [2] are simple – section [1] states the packages used for VMD, and section [2] defines variables to allow output in the correct format. Section [3] solvates the previously created structure from psfgen; the only unexpected argument is -t 12, which creates a buffer of at least 12 Angstroms from the solute to the edge of the solvent box. Section [4] controls the ionization process, with the -sc argument controlling the concentration of ions, and SOD and CLA being the CHARMM residue names for sodium and chlorine ions. Finally, section [5] makes a small modification to the final .pdb file – the beta value of all atoms is set to 0, then the beta value of the backbone atoms of the protein chain are set to 1, and the changes are written to the .pdb file. This is done to facilitate constraining the protein in the next step - it means we can apply restraints only to the backbone of the protein. Finally, in section [6], we define a function called get_cell; this measures the origin and dimensions of the water box created, which is information we need to pass to NAMD later on in this process. It then outputs this information into a file called boxSize.rtf for us to use later, and we then call for an exit of the VMD program.

Running a simulation¶

Now that we’ve finished solvating and ionizing the system, we’ve finally got our system ready to go! We can’t just launch straight into a production run, however – we’ve got to minimize the structure to avoid energetic clashes, and let equilibration occur to relax the protein from its crystal structure and let the water box adapt to the presence of the protein. The exact process one uses to do this differs depending on what you’re doing in your system, but the general trend stays the same – allow the solvent to equilibrate, allow the box to fluctuate to its preferred size, and allow the solute to equilibrate. Running a simulation based on a configuration file is simple:

namd2 +p(number of cores to be used) /path to configuration file/ > /path to desired output log file/

e.g. namd2 +p4 prodRun.conf > prodRun.log

Configuration Files¶

For this example, we’ll look through the configuration files in order – a period in the canonical (NVT) ensemble with the protein frozen will allow the equilibration of the solvent, a period in the NPT (isothermal-isobaric) ensemble will allow the box size to fluctuate, and a period in the canonical ensemble with protein restraints being reduced will allow the solute to equilibrate. These files are called solventNVT.conf, solventNPT.conf, and soluteNVT.conf respectively.

Initial Equilibration¶

We’ll go through the first of these files in detail, then explain any new additions for the remaining files. The first area of interest is the relatively mundane opening section:

# [1]
# NAMD configuration file for a insulin CHARMM36 NVT ensemble run at the beginning of a simulation                              

# [2]
# Declaring of global variables         
set temperature 298

# [3]
# Initial structures from previous steps - ionized and solvated - can have restart info here
structure 3i3z_ionized.psf
coordinates 3i3z_ionized.pdb

# [4]
# Output                                        
binaryOutput no ;# Don't want binary format, not pdb, for output
binaryRestart no ;
outputName 3i3z_solventEquilibrated
DCDfile 3i3z_solventEquilibrated.dcd
restartfreq 5000
dcdfreq 1000
xstFreq 5000
outputEnergies 100
outputTiming 1000 ;# Typically want x10 outputEnergies

# [5]
# Forcefield (paratypeCharmm specifies Charmm-style parameters, as opposed to others that NAMD can use)                             
paraTypeCharmm on
parameters par_all36_prot.prm
parameters toppar_water_ions_namd.str

It can be seen here that much of this initial part of the file is dedicated towards simple setup of the simulation. Section [1] is a simple title for the file (making it easier to open and see what you did without looking through the whole thing), Section [2] declares the only global variable required for this run (so you don’t need to have temperature set in several locations), and Section [3] simply provides the path to the .psf and .pdb files generated earlier. Section [4] details the output of the simulation – turning off binaryOutput and binaryRestart allows human-readable output files, outputName and DCDfile are the output names for the restart output and trajectories respectively; all remaining options here are the frequencies (in number of steps) of the output of restarting files, trajectories, box information, energies, and benchmarking respectively. The next section of the configuration file deals with the more theory-focused options:

# [6]
# Approximations - the higher these cutoff distances, the more accurate - these are cutoffs that are accurate for protein/biomolecular systems 
switching on
switchdist 10 # 2 Angstrom below the cutoff distance
cutoff 12 # Should be at least a few Angstrom beyond longest reasonable long-range interactions in system
pairlistdist 14 # pairlistdist - cutoff should be at least the distance that can be moved in one cycle, as defined in stepspercycle
stepspercycle 20
exclude scaled1-4
1-4scaling 1.0

# [7]
# Electrostatics                             
PME yes
PMEGridSpacing 1 ;# Essentially means NAMD is specifying the grid - can manually do so, but not needed

# [8]
# Integrator                                
timestep 2.0
rigidBonds all
nonbondedFreq 1
fullElectFrequency 2

# [9]
# Constant Temperature Control      
langevin on
langevinDamping 1.
langevinTemp $temperature
langevinHydrogen no

In section [6], the approximations that are used are listed. Switching is a gradual reduction of non-bonded forces at the listed distance, leading to a complete cut-off at the cutoff distance; the pairlistdist, stepspercycle, and 1-4scaling properties are thoroughly defined in the NAMD documentation, but the first two deal with the creation of a list of atoms that will have non-bonded interactions calculated, whereas the third controls whether atom pairs that have three bonds between them have non-bonded forces calculated (as different force fields parameterise their bonded parameters with this in mind). Section [7] is simple – it enables Particle Mesh Ewald summation for the long-distance electrostatic interactions, and then allows NAMD to choose the grid size. Section [8] deals with the integrator – the timestep is set to 2fs, rigidBonds are set to all hydrogen-heavy atom bonds, and the frequency of non-bonded calculations are set here. Finally, Section [9] controls the thermostat the regulates temperature – it is set to a Langevin thermostat, controlled to the set temperature, and doesn’t affect hydrogen atoms. We then get to the options related directly to our system:

# [10]
# Periodic Boundary Conditions - vector1/2/3 is the vector to the next image, cellorigin is the center of the cell; see the get_cell script in the NAMD tutorial or set of scripts to generate these                 
cellBasisVector1 47.691999435424805 0 0
cellBasisVector2 0 47.525999546051025 0
cellBasisVector3 0 0 55.8179988861084
cellOrigin -9.467723846435547 -17.910892486572266 -0.35832399129867554

wrapWater on ;# Wrap water molecules
wrapAll on ;# Wrap all non-water molecules
wrapNearest off ;# This is used for non-rectangular cells

# [11]
# Constrained Atoms - a harmonic restraint, typically applied to the solute of the system to allow solvent equilibration                 
constraints on
consexp 2 ;# The exponent used in the harmonic constraint
consref  3i3z_ionized.pdb ;#
conskfile 3i3z_ionized.pdb ;# The file where the force constant for the atom's restraining potential is found - can be same as above
conskcol B ;# Use the beta column of the above file
constraintScaling 250 ;# Constraints for the minimization

The first section here, Section [10], details the periodic boundary conditions. The first part contains the three vectors and the central point of these vectors; together, these describe the size of the water box. These vectors are created automatically – they are found in the file boxSize.rtf created in the solvation script provided, and the contents of this file can be copied directly to this area without issue. The second half controls what atoms can move from one side of the solvation box to the other – all molecules, water and not-water are included. Section [11] contains the commands related to constraining the atoms – initially it enables constraints and provides the files in which the constraints should be applied (consref) and the list of atoms that should be constrained (conskfile). The conskcol command details how the list of atoms is provided – in this case, it’s the beta column of a .pdb file, where any non-0 number present in this column will enable constraints. When ionizing our system, we set the beta value of all backbone atoms to 1 to facilitate these restraints. The constraintScaling parameter defines the strength of these constraints. With this completed, we just have the final section left:

# [12]
# Protocol
temperature $temperature
minimize 50000
reinitvels $temperature ;# Needed after minimization
run 500000

This section is very simple – it generates velocities for atoms at the set temperature, minimizes for a set number of steps, reinitiates the velocities, and finally actually runs the simulation for a set number of steps. With this section completed, we’ve run through the whole configuration file.

Changes for NPT equilibration¶

Before we move on from configuration files, however, we need to discuss the changes that occur in the next step – NPT equilibration; they are fairly minimal, with a Constant Pressure Control section being the sole addition:

# Constant Pressure Control (allows the volume to vary - i.e. makes the simulation NPT not NVT)
useGroupPressure yes ;# Without this, can't have the rigid hydrogen bonds needed for a 2fs timestep
useFlexibleCell no ;# Not needed for a waterbox, if using a membrane protein it is needed
useConstantArea no ;# Not needed for a waterbox - for a membrane protein, use in a case-by-case basis
langevinPiston on
langevinPistonTarget 1.01325 ;# 1 atm pressure in bar
langevinPistonPeriod 100. ;# Oscilation period around 100 fs is standard
langevinPistonDecay 50. ;# Oscilation decay around 50fs - want it smaller than the Period to ensure harmonic oscillations are overdamped
langevinPistonTemp $temperature ;# Coupled to the heat bath

This section allows the variation of the volume of the box – a flexible cell allows each length of the box to vary independently, which is not normally required (and is slightly more complicated than not having it enabled) unless you’re using a membrane. The langevinPiston is the method through which this occurs; it has a set target (1 atm normally), which oscillates over a set period of time. This period should be kept fairly long, as pressure will naturally vary quite dramatically over time in MD simulations.

Changes for final equlibration¶

Outside of the addition of the pressure control section, the other change in the remaining configuration files occurs in the soluteNVT.conf file, and details the reduction of constraints on the protein - this requires a little tcl scripting in the protocol section:

# Protocol
run 5000
for {set i 0} {$i <= 3} {incr i 1} {
        set ck [expr {250.0 *(1.0/(10.0**$i))}]
        constraintScaling $ck
run 5000
} ;# This one is a bit of a little trick - run for 5000 steps, then decrease the energy of the constraints by 10x, and repeat till at 0.25 kcal/mol, then run without any constraints for a while to allow equilibration of the solute
run 500000

As can be seen above, the protocol section from the soluteNVT.conf file is a little different from the previous two – a small for loop is created in the TCL scripting language in which the constraintScaling parameter is exponentially reduced as the loop is continued, and short sets of steps are run. Once this loop is completed, the simulation runs for a nanosecond of NVT equilibration without any restraints. With this completed, the protein is fully equilibrated and ready for a production run – the actual simulation. The production run is very simple – it uses a configuration file that’s almost identical to the initial solventNVT file with the constraints removed, and a much longer run time. Have a read through the prodRun.conf file – if you’re not sure what anything does, it should be explained above (or in the NAMD documentation, if you’d rather get used to it now).

Conclusions¶

With the above completed, you now have a final trajectory available to you. You’re not quite finished yet - how you analyse it is also important! There are many possible forms of analysis in MD, and what is useful or not is dependent on the system you’re analysing. However, analysis is at least as large a topic as running the simulation, and the methodologies that one typically uses to analyse these simulations aren’t dependent on the program used to simulate. Because of this, I’ve separated out analysis from this tutorial, and written a separate tutorial about analysis of biomolecular MD simulations! This tutorial demonstrates the basic workflow used to perform an MD simulation in NAMD, and there is definitely more complexity to it than shown - but hopefully this helps you get started!

Extra Information¶

Solvation Boxes and Periodic Boundary Conditions¶

Solvation boxes default to a square or rectangular shape for a good reason - it is possible to solvate in other shapes, with spherical solvation even minimizing the solvent atoms:solute atoms ratio (and so maximizing cost efficiency of the calculation), but these are rarely used. The evaluation of long-range ionic interactions is difficult computationally, as they reduce in intensity across distance considerably more slowly than the Lennard-Jones model of non-coulombic interactions. Almost all modern MD packages use a Particle Mesh Ewald (PME) method to solve this – after a set distance, a grid is created (typically in 1 Å² grid sizes) and the sum of all charges in this grid is used for the interaction. This is computationally considerably cheaper, but this process continues until no new charges are being found – this requires a net neutral system, and periodic boundary conditions (PBC) to be in effect. Periodic boundary conditions involve atoms at the edge of the solvent box being affected by the forces of atoms on the exact opposite side of the box – if an atom travels out of the box, it is instead moved to the opposite side of the box. This effectively eliminates boundary effects in which the solute experiences a different environment at the edge of the box, but this necessitates a rectangular solvation box.

NAMD Configuration File Options¶

Configuration Option Name	Argument(s) Given	Effect
structure	A path to a .psf file	Informs the simulation of the connectivity of atoms present
coordinates	A path to a .pdf file	Informs the simulation of the xyz location of atoms present
restartfreq	An integer	How often (in number of steps) the full set of restart files are output
parameters	A path to a forcefield parameter file	Informs the simulation of the parameters for atoms in the simulation; multiple of these are almost always required
cutoff	An integer	The point at which non-coulombic long-range interactions are cut off
PME	A Boolean yes/no	Enables Particle Mesh Ewald summation for coulombic interactions (see extra information)
timestep	Floating point	The amount of time (in femtoseconds) the simulation jumps in-between each step; longer is more computationally efficient, but may miss more information
Langevin	A Boolean on/off	Enables Langevin dynamics to control the temperature of the simulation
cellBasisVector½/3	A vector of 3 floating points	There are three of these – together with the cellOrigin, these vectors define the 3d space that is the simulation box; the get_cell script will define these for you
wrapAll	A Boolean on/off	Enables all molecules to ‘wrap’ – move from one end of a simulation box to the other when under periodic boundary conditions
langevinPiston	A Boolean on/off	Enables a Langevin piston that controls the pressure of the simulation in addition to the temperature; this allows the NPT ensemble to be used
constrains	A Boolean on/off	Allows the constraining of atoms with a harmonic potential; useful to allow things such as the solvent equilibrating without interference from the solute, or vice versa
temperature	A floating point	Sets the temperature of the simulation to this value (in K)
minimize	An integer	Starts a minimization; this is a minimization by steepest descent by default and runs for the given number of steps. Once a simulation has started, many options cannot be changed – so put this at the end of your configuration file.
run	An integer	Starts a simulation that runs for the given number of steps; again, this locks in many options and so should be done at the end of a configuration file.