Predicting Molecular Energy Using Force-Field Optimized Geometries and Atomic Vector Representations Learned from an Improved Deep Tensor Neural Network

Jianing Lu, Cheng Wang, Yingkai Zhang

Research output: Contribution to journalArticle

Abstract

The use of neural networks to predict molecular properties calculated from high level quantum mechanical calculations has made significant advances in recent years, but most models need input geometries from DFT optimizations which limit their applicability in practice. In this work, we explored how machine learning can be used to predict molecular atomization energies and conformation stability using optimized geometries from Merck Molecular Force Field (MMFF). On the basis of the recently introduced deep tensor neural network (DTNN) approach, we first improved its training efficiency and performed an extensive search of its hyperparameters, and developed a DTNN-7ib model which has a test accuracy of 0.34 kcal/mol mean absolute error (MAE) on QM9 data set. Then using atomic vector representations in the DTNN-7ib model, we employed transfer learning (TL) strategy to train readout layers on the QM9M data set, in which QM properties are the same as in QM9 [calculated at the B3LYP/6-31G(2df,p) level] while molecular geometries are corresponding local minima optimized with MMFF94 force field. The developed TL-QM9M model can achieve an MAE of 0.79 kcal/mol using MMFF optimized geometries. Furthermore, we demonstrated that the same transfer learning strategy with the same atomic vector representation can be used to develop a machine learning model that can achieve an MAE of 0.51 kcal/mol in molecular energy prediction using MMFF geometries for an eMol9-CM conformation data set, which consists of 9959 molecules and 88 234 conformations with energies calculated at the B3LYP/6-31G∗ level. Our results indicate that DFT-level accuracy of molecular energy prediction can be achieved using force-field optimized geometries and atomic vector representations learned from deep tensor neural network, and integrated molecular modeling and machine learning would be a promising approach to develop more powerful computational tools for molecular conformation analysis.

Original languageEnglish (US)
JournalJournal of Chemical Theory and Computation
DOIs
StatePublished - Jan 1 2019

Fingerprint

field theory (physics)
Tensors
tensors
Neural networks
learning
machine learning
Geometry
Conformations
geometry
Learning systems
Discrete Fourier transforms
energy
Molecular modeling
molecular properties
atomizing
Atomization
predictions
readout
education
Molecules

ASJC Scopus subject areas

  • Computer Science Applications
  • Physical and Theoretical Chemistry

Cite this

@article{2e82cc4cdde04fe99ac850a66c5deb0a,
title = "Predicting Molecular Energy Using Force-Field Optimized Geometries and Atomic Vector Representations Learned from an Improved Deep Tensor Neural Network",
abstract = "The use of neural networks to predict molecular properties calculated from high level quantum mechanical calculations has made significant advances in recent years, but most models need input geometries from DFT optimizations which limit their applicability in practice. In this work, we explored how machine learning can be used to predict molecular atomization energies and conformation stability using optimized geometries from Merck Molecular Force Field (MMFF). On the basis of the recently introduced deep tensor neural network (DTNN) approach, we first improved its training efficiency and performed an extensive search of its hyperparameters, and developed a DTNN-7ib model which has a test accuracy of 0.34 kcal/mol mean absolute error (MAE) on QM9 data set. Then using atomic vector representations in the DTNN-7ib model, we employed transfer learning (TL) strategy to train readout layers on the QM9M data set, in which QM properties are the same as in QM9 [calculated at the B3LYP/6-31G(2df,p) level] while molecular geometries are corresponding local minima optimized with MMFF94 force field. The developed TL-QM9M model can achieve an MAE of 0.79 kcal/mol using MMFF optimized geometries. Furthermore, we demonstrated that the same transfer learning strategy with the same atomic vector representation can be used to develop a machine learning model that can achieve an MAE of 0.51 kcal/mol in molecular energy prediction using MMFF geometries for an eMol9-CM conformation data set, which consists of 9959 molecules and 88 234 conformations with energies calculated at the B3LYP/6-31G∗ level. Our results indicate that DFT-level accuracy of molecular energy prediction can be achieved using force-field optimized geometries and atomic vector representations learned from deep tensor neural network, and integrated molecular modeling and machine learning would be a promising approach to develop more powerful computational tools for molecular conformation analysis.",
author = "Jianing Lu and Cheng Wang and Yingkai Zhang",
year = "2019",
month = "1",
day = "1",
doi = "10.1021/acs.jctc.9b00001",
language = "English (US)",
journal = "Journal of Chemical Theory and Computation",
issn = "1549-9618",
publisher = "American Chemical Society",

}

TY - JOUR

T1 - Predicting Molecular Energy Using Force-Field Optimized Geometries and Atomic Vector Representations Learned from an Improved Deep Tensor Neural Network

AU - Lu, Jianing

AU - Wang, Cheng

AU - Zhang, Yingkai

PY - 2019/1/1

Y1 - 2019/1/1

N2 - The use of neural networks to predict molecular properties calculated from high level quantum mechanical calculations has made significant advances in recent years, but most models need input geometries from DFT optimizations which limit their applicability in practice. In this work, we explored how machine learning can be used to predict molecular atomization energies and conformation stability using optimized geometries from Merck Molecular Force Field (MMFF). On the basis of the recently introduced deep tensor neural network (DTNN) approach, we first improved its training efficiency and performed an extensive search of its hyperparameters, and developed a DTNN-7ib model which has a test accuracy of 0.34 kcal/mol mean absolute error (MAE) on QM9 data set. Then using atomic vector representations in the DTNN-7ib model, we employed transfer learning (TL) strategy to train readout layers on the QM9M data set, in which QM properties are the same as in QM9 [calculated at the B3LYP/6-31G(2df,p) level] while molecular geometries are corresponding local minima optimized with MMFF94 force field. The developed TL-QM9M model can achieve an MAE of 0.79 kcal/mol using MMFF optimized geometries. Furthermore, we demonstrated that the same transfer learning strategy with the same atomic vector representation can be used to develop a machine learning model that can achieve an MAE of 0.51 kcal/mol in molecular energy prediction using MMFF geometries for an eMol9-CM conformation data set, which consists of 9959 molecules and 88 234 conformations with energies calculated at the B3LYP/6-31G∗ level. Our results indicate that DFT-level accuracy of molecular energy prediction can be achieved using force-field optimized geometries and atomic vector representations learned from deep tensor neural network, and integrated molecular modeling and machine learning would be a promising approach to develop more powerful computational tools for molecular conformation analysis.

AB - The use of neural networks to predict molecular properties calculated from high level quantum mechanical calculations has made significant advances in recent years, but most models need input geometries from DFT optimizations which limit their applicability in practice. In this work, we explored how machine learning can be used to predict molecular atomization energies and conformation stability using optimized geometries from Merck Molecular Force Field (MMFF). On the basis of the recently introduced deep tensor neural network (DTNN) approach, we first improved its training efficiency and performed an extensive search of its hyperparameters, and developed a DTNN-7ib model which has a test accuracy of 0.34 kcal/mol mean absolute error (MAE) on QM9 data set. Then using atomic vector representations in the DTNN-7ib model, we employed transfer learning (TL) strategy to train readout layers on the QM9M data set, in which QM properties are the same as in QM9 [calculated at the B3LYP/6-31G(2df,p) level] while molecular geometries are corresponding local minima optimized with MMFF94 force field. The developed TL-QM9M model can achieve an MAE of 0.79 kcal/mol using MMFF optimized geometries. Furthermore, we demonstrated that the same transfer learning strategy with the same atomic vector representation can be used to develop a machine learning model that can achieve an MAE of 0.51 kcal/mol in molecular energy prediction using MMFF geometries for an eMol9-CM conformation data set, which consists of 9959 molecules and 88 234 conformations with energies calculated at the B3LYP/6-31G∗ level. Our results indicate that DFT-level accuracy of molecular energy prediction can be achieved using force-field optimized geometries and atomic vector representations learned from deep tensor neural network, and integrated molecular modeling and machine learning would be a promising approach to develop more powerful computational tools for molecular conformation analysis.

UR - http://www.scopus.com/inward/record.url?scp=85067958962&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067958962&partnerID=8YFLogxK

U2 - 10.1021/acs.jctc.9b00001

DO - 10.1021/acs.jctc.9b00001

M3 - Article

C2 - 31142110

AN - SCOPUS:85067958962

JO - Journal of Chemical Theory and Computation

JF - Journal of Chemical Theory and Computation

SN - 1549-9618

ER -