VecMol

Published

3D Molecular Generation — Vector-Field Representations

Diffusion Model EGNN Vector Field Molecular Geometry

Proposed a novel vector-field representation for molecular generation by modeling continuous molecular fields instead of directly generating atomic coordinates. This approach enables more efficient and physically plausible 3D molecule generation. Accepted at ICML 2026.

Mass Spectrum ↔ Molecule Retrieval

In Progress

Cross-modal Retrieval for Metabolite Identification

DreaMS MoLFormer Contrastive Learning Retrieval

Developed a CLIP-style cross-modal retrieval framework that aligns tandem mass spectra with molecular representations. This enables direct retrieval of candidate molecules from spectral queries, facilitating metabolite identification without requiring exhaustive database search.

Knowledge-guided Molecular Retrieval

Ongoing

Incorporating Chemical Priors into Retrieval Models

Fragmentation Chemistry Biological Prior Reaction Knowledge

Exploring how chemical fragmentation mechanisms and biological knowledge can be incorporated into retrieval models for metabolite identification. This project aims to move beyond pure embedding-based retrieval by integrating domain-specific reasoning about how molecules fragment and behave in biological systems. This is becoming the central project of my current research.

Graph Foundation Models

Shanghai AI Lab

LLM-driven Graph Data Generation

GraphGen LLM Synthetic Data Foundation Model

Contributed to GraphGen, a framework that enhances supervised fine-tuning for large language models with knowledge-driven synthetic graph data generation. This work explores how structured graph data can be generated and leveraged to improve LLM performance. Submitted to KDD and EMNLP.