The 16th CCKS was held in Qinhuangdao on August 26, 2022. CCKS technology evaluation aims to provide a platform and resources for researchers to test knowledge graph and semantic computing technologies, algorithms and systems, bolster the technological development in the field of knowledge graph in China, and promote the integration and coordination of academic achievements and industrial needs. The official announcement disclosed that there are 14 evaluation tasks in this year, encompassing 5 topics in fields of finance, education, military, chemical and etc.; 5,362 teams with nearly 23,000 people participated in the competition.
Yang Wenming, Yin Le and Zou Jiali, members of the AI department of METiS Pharmaceuticals, stood out from many other teams and won the evaluation task competition of " Construction and Application of Chemical Elemental Knowledge Graph " with absolute superiority.
The main intention of compound property prediction is to find compounds with substandard physical and chemical properties in a short time, so as to reduce risk when candidates entering into clinical trials and improve the drug development success rate. The traditional method of compound property prediction and analysis is usually from experiments, which is costly and time-consuming. A lot of research work has proved that the machine learning technology, especially the depth study on compound property prediction, has huge utilization potential. The drug research and development work are implemented by using sequence (SMILES) or graphs (atomic for the node, chemical bonds for edge) to indicate compounds, and leveraging sequence modeling or graph neural network (GNN) to predict the properties of the compound, so as to improve the efficiency and reduce the research cycle. However, these approaches usually only take the structural information of compound molecules into consideration not the chemical knowledge. Therefore, the organizer constructed the chemical element knowledge graph based on the periodic table of chemical elements, and brought forward the evaluation tasks in regard to the key technologies and applications of the knowledge graph construction.
METiS team exploited the following approaches to study the compound molecules: 1) Graph data pre-training technology, the atoms in the molecule are regarded as nodes, chemical bonds as edges, and the chemical molecule as a graph data; 2) Molecular descriptors; 3) Pharmacophore fingerprints. In the pre-training model, 2D information, 3D information and functional groups, etc. of chemical molecules are included comprehensively. Based on the above different approaches, feature vectors are generated to characterize molecules. Secondly, descending the high dimensional sparse vectors (functional group) into lower dimensional dense ones, then integrate these vectors that with various features to generate the ultimate vector characteristic. Lastly, import those vector characteristics for prediction from integrated model, which foretell a precise compound characteristic.
Refer to the ensuing disclosed evaluation paper in CCKS2022 for specific methods.
“Compound property prediction based on multiple different molecular features and ensemble learning”(website:www.sigkg.cn/ccks2022)
Schematic Diagram for Molecular Characterization
The integration of Pharmacoprint (Gobbi) can improve the final evaluation metrics, but pharmacophore fingerprint is a high-dimensional sparse vector, mainly composed of 0 or 1. For such features with excessive dimensions, AutoEncoder is used to reduce dimension, and label information is also fully used in the process of dimension reduction. The model structure is shown in the following figure. The processed molecular characteristics are spliced and merged, and then import into the model for training.
The winning demonstrated that METiS has made great progress in the field of AI pharmaceutical in terms of the capability of algorithm and model building .
Now, the team's technical solution has been successfully applied to the AiLNP & AiTEM , two core technology platform of METiS. It is an efficient supportive tool for prediction of LNP and solid dispersion system, enabling "AI pharmaceutical", shortening the development cycle, improving the R&D efficiency, to achieve more effective innovational delivery material design and more advantageous formulation system. In the future, METiS will endeavor to make great efforts in fundamental research and pharmaceutical R&D in AI pharmaceutical by virtue of its leading scientific research and innovation ability, continue exploring the uncharted area of biology and pharmaceutical benefit from AI technology, to ultimately address the unmet clinical needs.