Automating the LAMMPS data file writing

 Building a LAMMPS data file manually can be awfully time-consuming, especially for larger systems. In order to expedite the process, it is possible to use some widely available software to generate the initial file which contains atomic and molecular information. For instance, CHARMM-GUI (see link) could be a useful tool to generate LAMMPS data files for biological systems. However, we have not been able to find any free-to-use software for creating LAMMPS data files for more general applications. The most accessible option is to generate a different file with a robust molecular format, such as .mol2. This particular format seems like a great choice because it offers a vast array of stored information. Most importantly, these files could specify the XYZ coordinates of atoms, their atom symbols, hybridisation, certain connectivity types, charges, molecule IDs, and IDs of bonded atoms.

To execute the initial task of generating a .mol2 file, we found that the most useful programs were Avogadro (see link) and Atomic Charge Calculator II (see link). Avogadro is useful for manually assembling molecules with a comprehensive 3d interface and offers tools such as geometric optimization. Files could be saved in many molecular formats, including .mol2. CHARMM-GUI could be used to create customized plasma membrane fragments or aid other widely applied biological system simulations. This tool is highly specialised, and it has been particularly useful in reaching the goal of our simulation. Although Avogadro is capable of calculating charges, we found that Atomic Charge Calculator II might yield more accurate results. And while this program could generate a .mol2 file, its atoms lack detailed Sybyl .mol2 atom type information. For that reason, we recommend using the generated text file for carrying through charge information. Nevertheless, it must be noted that this combination of programs might not be optimal for all applications. If the simulation goal is vastly different from ours, we recommend doing extra research for every unique case.

To reap all the benefits of the .mol2 format, it is crucial to understand its syntax. In most circumstances, these files follow a general structure. Components that are useful for LAMMPS data file generation in most cases are written in red

For more information about the .mol2 format, see manual.

Common format:

@<TRIPOS>MOLECULE

*****

 [number of atoms] [number of bonds] 0 0 0

[extra information]

[extra information]

 

@<TRIPOS>ATOM

     [atom ID] [atom name*] [x] [y] [z] [atom type**] [mol ID] [mol name] [charge]

            …

@<TRIPOS>BOND

     [bond ID] [atom ID1] [atom ID2] [bond multiplicity]

            …

*Atom name is a certain unique atom name assigned by the software that is generating it. In most cases, it does not contain any useful information.

**Atom type follows the Sybyl .mol2 syntax. In addition to the atom symbol, it contains some useful data like hybridization and atom connectivity in certain circumstances. For a comprehensive list of .mol2 atom types, see manual.

After a finalised .mol2 file is generated, the next step is to reformat it to the desired LAMMPS data file. Once again, we have found no free-to-use software that could perform this operation, but it could be achieved using some independently created code. By applying the LAMMPS data file requirements, a relatively basic data manipulation program could be written. I have created a C++ code that is open-source and could be altered to suit any needs. It does not encompass all possible atom, bond, angle, dihedral, or improper styles, but it all could be easily modified to suit any other style or their combination. The latest version of my code is available on our blog (see link). To effectively apply my code, a user should provide a .mol2 format file (named “input.mol2”) in the same folder as the executable file. If relevant, a text file containing all the charge information should be provided in the same folder and named “chargefile.txt”. Then the prompt window will ask a series of questions about the requirements for generating data. It also asks if the they would like to assign all parameters in the same window, or by editing the generated data file. Bonds, angles and dihedrals could be generated automatically, while the generation of impropers is semi-automatic. Since inpropers are a less commonly applied characteristic, the code asks to confirm which initially generated improper types should be applied to the system. After the LAMMPS data file is finished compiling, it should be situated in the same folder as the executable program and named “output.dat”. If any questions arise about the used syntax or other inner workings in my code, please do not hesitate to contact me at lukas.supragonas.19@ucl.ac.uk.

The last component of the data file that we have not yet discussed is the coefficients for all pair, atom, bond, angle, dihedral, and improper types. CHARMM-GUI is capable of generating all necessary force fields with their coefficients. We have also learned that it is supposedly possible to obtain these parameters using software like ORCA and Gaussian, as well as running some ab initio quantum mechanical simulations. However, we have not succeeded in either of these methods. Instead, for our case study, we used the parameters that we have been able to find stated in scientific literature, and by using some data for angles and bonds generated by Avogadro after geometric optimization. Ideally, all coefficients should be generated using professional methods described earlier. However, using generalised data could be suitable for some simple applications that do not require a lot of precision.


Author: Lukas Supragonas

Comments

Popular posts from this blog

Building a LAMMPS input file: a tutorial

Case Study: Remdesivir analog GS-441524 and model human cell membrane