Determining Optimal Coarse-Grained Representation for Biomolecules Using Internal Cluster Validation Indexes

Zhenliang Wu, Yuwei Zhang, John Zenghui Zhang, Kelin Xia, Fei Xia

Research output: Contribution to journalArticle

Abstract

The development of ultracoarse-grained models for large biomolecules needs to derive the optimal number of coarse-grained (CG) sites to represent the targets. In this work, we propose to use the statistical internal cluster validation indexes to determine the optimal number of CG sites that are optimized based on the essential dynamics coarse-graining method. The calculated curves of Calinski-Harabasz and Silhouette Coefficient indexes exhibit the extrema corresponding to the similar CG numbers. The calculated ratios of the optimal CG numbers to the residue numbers of fine-grained models are in the range from 4 to 2. The comparison of the stability of index results indicates that Calinski-Harabasz index is the better choice to determine the optimal CG representation in coarse-graining.

Original languageEnglish (US)
JournalJournal of Computational Chemistry
DOIs
StateAccepted/In press - Jan 1 2019

Fingerprint

Cluster Validation
Biomolecules
Internal
Coarse-graining
Silhouette
Extremum
Curve
Target
Coefficient
Model
Range of data

Keywords

  • CH index
  • coarse-graining
  • internal cluster validation index
  • optimal CG sites
  • SC index

ASJC Scopus subject areas

  • Chemistry(all)
  • Computational Mathematics

Cite this

Determining Optimal Coarse-Grained Representation for Biomolecules Using Internal Cluster Validation Indexes. / Wu, Zhenliang; Zhang, Yuwei; Zhang, John Zenghui; Xia, Kelin; Xia, Fei.

In: Journal of Computational Chemistry, 01.01.2019.

Research output: Contribution to journalArticle

@article{177ca48282334dd78bb349ba9a038a14,
title = "Determining Optimal Coarse-Grained Representation for Biomolecules Using Internal Cluster Validation Indexes",
abstract = "The development of ultracoarse-grained models for large biomolecules needs to derive the optimal number of coarse-grained (CG) sites to represent the targets. In this work, we propose to use the statistical internal cluster validation indexes to determine the optimal number of CG sites that are optimized based on the essential dynamics coarse-graining method. The calculated curves of Calinski-Harabasz and Silhouette Coefficient indexes exhibit the extrema corresponding to the similar CG numbers. The calculated ratios of the optimal CG numbers to the residue numbers of fine-grained models are in the range from 4 to 2. The comparison of the stability of index results indicates that Calinski-Harabasz index is the better choice to determine the optimal CG representation in coarse-graining.",
keywords = "CH index, coarse-graining, internal cluster validation index, optimal CG sites, SC index",
author = "Zhenliang Wu and Yuwei Zhang and Zhang, {John Zenghui} and Kelin Xia and Fei Xia",
year = "2019",
month = "1",
day = "1",
doi = "10.1002/jcc.26070",
language = "English (US)",
journal = "Journal of Computational Chemistry",
issn = "0192-8651",
publisher = "John Wiley and Sons Inc.",

}

TY - JOUR

T1 - Determining Optimal Coarse-Grained Representation for Biomolecules Using Internal Cluster Validation Indexes

AU - Wu, Zhenliang

AU - Zhang, Yuwei

AU - Zhang, John Zenghui

AU - Xia, Kelin

AU - Xia, Fei

PY - 2019/1/1

Y1 - 2019/1/1

N2 - The development of ultracoarse-grained models for large biomolecules needs to derive the optimal number of coarse-grained (CG) sites to represent the targets. In this work, we propose to use the statistical internal cluster validation indexes to determine the optimal number of CG sites that are optimized based on the essential dynamics coarse-graining method. The calculated curves of Calinski-Harabasz and Silhouette Coefficient indexes exhibit the extrema corresponding to the similar CG numbers. The calculated ratios of the optimal CG numbers to the residue numbers of fine-grained models are in the range from 4 to 2. The comparison of the stability of index results indicates that Calinski-Harabasz index is the better choice to determine the optimal CG representation in coarse-graining.

AB - The development of ultracoarse-grained models for large biomolecules needs to derive the optimal number of coarse-grained (CG) sites to represent the targets. In this work, we propose to use the statistical internal cluster validation indexes to determine the optimal number of CG sites that are optimized based on the essential dynamics coarse-graining method. The calculated curves of Calinski-Harabasz and Silhouette Coefficient indexes exhibit the extrema corresponding to the similar CG numbers. The calculated ratios of the optimal CG numbers to the residue numbers of fine-grained models are in the range from 4 to 2. The comparison of the stability of index results indicates that Calinski-Harabasz index is the better choice to determine the optimal CG representation in coarse-graining.

KW - CH index

KW - coarse-graining

KW - internal cluster validation index

KW - optimal CG sites

KW - SC index

UR - http://www.scopus.com/inward/record.url?scp=85073933916&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073933916&partnerID=8YFLogxK

U2 - 10.1002/jcc.26070

DO - 10.1002/jcc.26070

M3 - Article

AN - SCOPUS:85073933916

JO - Journal of Computational Chemistry

JF - Journal of Computational Chemistry

SN - 0192-8651

ER -