Improving scoring-docking-screening powers of protein–ligand scoring functions using random forest

Cheng Wang, Yingkai Zhang

Research output: Contribution to journalArticle

Abstract

The development of new protein–ligand scoring functions using machine learning algorithms, such as random forest, has been of significant interest. By efficiently utilizing expanded feature sets and a large set of experimental data, random forest based scoring functions (RFbScore) can achieve better correlations to experimental protein–ligand binding data with known crystal structures; however, more extensive tests indicate that such enhancement in scoring power comes with significant under-performance in docking and screening power tests compared to traditional scoring functions. In this work, to improve scoring-docking-screening powers of protein–ligand docking functions simultaneously, we have introduced a ΔvinaRF parameterization and feature selection framework based on random forest. Our developed scoring function ΔvinaRF20, which employs 20 descriptors in addition to the AutoDock Vina score, can achieve superior performance in all power tests of both CASF-2013 and CASF-2007 benchmarks compared to classical scoring functions. The ΔvinaRF20 scoring function and its code are freely available on the web at: https://www.nyu.edu/projects/yzhang/DeltaVina.

Original languageEnglish (US)
Pages (from-to)169-177
Number of pages9
JournalJournal of Computational Chemistry
Volume38
Issue number3
DOIs
StatePublished - Jan 30 2017

Fingerprint

Random Forest
Docking
Scoring
Screening
Power of Test
Parameterization
Crystal Structure
Learning algorithms
Learning systems
Feature extraction
Large Set
Feature Selection
Descriptors
Crystal structure
Learning Algorithm
Machine Learning
Enhancement
Experimental Data
Benchmark

Keywords

  • docking
  • machine learning
  • protein–ligand binding affinity
  • random forest
  • scoring function

ASJC Scopus subject areas

  • Chemistry(all)
  • Computational Mathematics

Cite this

Improving scoring-docking-screening powers of protein–ligand scoring functions using random forest. / Wang, Cheng; Zhang, Yingkai.

In: Journal of Computational Chemistry, Vol. 38, No. 3, 30.01.2017, p. 169-177.

Research output: Contribution to journalArticle

@article{95a8d3be16084f03a3c948e388acdd79,
title = "Improving scoring-docking-screening powers of protein–ligand scoring functions using random forest",
abstract = "The development of new protein–ligand scoring functions using machine learning algorithms, such as random forest, has been of significant interest. By efficiently utilizing expanded feature sets and a large set of experimental data, random forest based scoring functions (RFbScore) can achieve better correlations to experimental protein–ligand binding data with known crystal structures; however, more extensive tests indicate that such enhancement in scoring power comes with significant under-performance in docking and screening power tests compared to traditional scoring functions. In this work, to improve scoring-docking-screening powers of protein–ligand docking functions simultaneously, we have introduced a ΔvinaRF parameterization and feature selection framework based on random forest. Our developed scoring function ΔvinaRF20, which employs 20 descriptors in addition to the AutoDock Vina score, can achieve superior performance in all power tests of both CASF-2013 and CASF-2007 benchmarks compared to classical scoring functions. The ΔvinaRF20 scoring function and its code are freely available on the web at: https://www.nyu.edu/projects/yzhang/DeltaVina.",
keywords = "docking, machine learning, protein–ligand binding affinity, random forest, scoring function",
author = "Cheng Wang and Yingkai Zhang",
year = "2017",
month = "1",
day = "30",
doi = "10.1002/jcc.24667",
language = "English (US)",
volume = "38",
pages = "169--177",
journal = "Journal of Computational Chemistry",
issn = "0192-8651",
publisher = "John Wiley and Sons Inc.",
number = "3",

}

TY - JOUR

T1 - Improving scoring-docking-screening powers of protein–ligand scoring functions using random forest

AU - Wang, Cheng

AU - Zhang, Yingkai

PY - 2017/1/30

Y1 - 2017/1/30

N2 - The development of new protein–ligand scoring functions using machine learning algorithms, such as random forest, has been of significant interest. By efficiently utilizing expanded feature sets and a large set of experimental data, random forest based scoring functions (RFbScore) can achieve better correlations to experimental protein–ligand binding data with known crystal structures; however, more extensive tests indicate that such enhancement in scoring power comes with significant under-performance in docking and screening power tests compared to traditional scoring functions. In this work, to improve scoring-docking-screening powers of protein–ligand docking functions simultaneously, we have introduced a ΔvinaRF parameterization and feature selection framework based on random forest. Our developed scoring function ΔvinaRF20, which employs 20 descriptors in addition to the AutoDock Vina score, can achieve superior performance in all power tests of both CASF-2013 and CASF-2007 benchmarks compared to classical scoring functions. The ΔvinaRF20 scoring function and its code are freely available on the web at: https://www.nyu.edu/projects/yzhang/DeltaVina.

AB - The development of new protein–ligand scoring functions using machine learning algorithms, such as random forest, has been of significant interest. By efficiently utilizing expanded feature sets and a large set of experimental data, random forest based scoring functions (RFbScore) can achieve better correlations to experimental protein–ligand binding data with known crystal structures; however, more extensive tests indicate that such enhancement in scoring power comes with significant under-performance in docking and screening power tests compared to traditional scoring functions. In this work, to improve scoring-docking-screening powers of protein–ligand docking functions simultaneously, we have introduced a ΔvinaRF parameterization and feature selection framework based on random forest. Our developed scoring function ΔvinaRF20, which employs 20 descriptors in addition to the AutoDock Vina score, can achieve superior performance in all power tests of both CASF-2013 and CASF-2007 benchmarks compared to classical scoring functions. The ΔvinaRF20 scoring function and its code are freely available on the web at: https://www.nyu.edu/projects/yzhang/DeltaVina.

KW - docking

KW - machine learning

KW - protein–ligand binding affinity

KW - random forest

KW - scoring function

UR - http://www.scopus.com/inward/record.url?scp=85000454204&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85000454204&partnerID=8YFLogxK

U2 - 10.1002/jcc.24667

DO - 10.1002/jcc.24667

M3 - Article

VL - 38

SP - 169

EP - 177

JO - Journal of Computational Chemistry

JF - Journal of Computational Chemistry

SN - 0192-8651

IS - 3

ER -