-
(ChemBERT) Automated Chemical Reaction Extraction from Scientific Literature
Journal of Chemical Information and Modeling 2022
[Paper] [GitHub] [Model (Base)] -
(MatSciBERT) MatSciBERT: A Materials Domain Language Model for Text Mining and Information Extraction
npj Computational Materials 2022
[Paper] [GitHub] [Model (Base)] -
(MatBERT) Quantifying the Advantage of Domain-Specific Pre-training on Named Entity Recognition Tasks in Materials Science
Patterns 2022
[Paper] [GitHub] -
(BatteryBERT) BatteryBERT: A Pretrained Language Model for Battery Database Enhancement
Journal of Chemical Information and Modeling 2022
[Paper] [GitHub] [Model (Base)] -
(MaterialsBERT) A General-Purpose Material Property Data Extraction Pipeline from Large Polymer Corpora using Natural Language Processing
npj Computational Materials 2023
[Paper] [Model (Base)] -
(Recycle-BERT) Recycle-BERT: Extracting Knowledge about Plastic Waste Recycling by Natural Language Processing
ACS Sustainable Chemistry & Engineering 2023
[Paper] [GitHub] -
(CatBERTa) Catalyst Property Prediction with CatBERTa: Unveiling Feature Exploration Strategies through Large Language Models
ACS Catalysis 2023
[Paper] [GitHub] -
(LLM-Prop) LLM-Prop: Predicting Physical and Electronic Properties of Crystalline Solids from Their Text Descriptions
arXiv 2023
[Paper] [GitHub] -
(ChemDFM) ChemDFM: Dialogue Foundation Model for Chemistry
arXiv 2024
[Paper] [GitHub] [Model (13B)] -
(CrystalLLM) Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
ICLR 2024
[Paper] [GitHub] -
(ChemLLM) ChemLLM: A Chemical Large Language Model
arXiv 2024
[Paper] [Model (7B)] -
(LlaSMol) LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset
COLM 2024
[Paper] [GitHub] [Model (6.7B, Galactica)] [Model (7B, LLaMA-2)] [Model (7B, Mistral)] -
(KALE-LM) KALE-LM: Unleash The Power Of AI For Science Via Knowledge And Logic Enhanced Large Model
arXiv 2024
[Paper] [Model (8B)]
-
(Text2Mol) Text2Mol: Cross-Modal Molecule Retrieval with Natural Language Queries
EMNLP 2021
[Paper] [GitHub] -
(KV-PLM) A Deep-learning System Bridging Molecule Structure and Biomedical Text with Comprehension Comparable to Human Professionals
Nature Communications 2022
[Paper] [GitHub] [Model (Base)] -
(MolT5) Translation between Molecules and Natural Language
EMNLP 2022
[Paper] [GitHub] [Model (60M)] [Model (220M)] [Model (770M)] -
(MoMu) A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language
arXiv 2022
[Paper] [GitHub] -
(MoleculeSTM) Multi-modal Molecule Structure-text Model for Text-Based Retrieval and Editing
Nature Machine Intelligence 2023
[Paper] [GitHub] -
(Text+Chem T5) Unifying Molecular and Textual Representations via Multi-task Language Modelling
ICML 2023
[Paper] [GitHub] [Model (60M)] [Model (220M)] -
(GIMLET) GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning
NeurIPS 2023
[Paper] [GitHub] [Model (60M)] -
(MolFM) MolFM: A Multimodal Molecular Foundation Model
arXiv 2023
[Paper] [GitHub] -
(MolCA) MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter
EMNLP 2023
[Paper] [GitHub] -
(MolLM) MolLM: A Unified Language Model for Integrating Biomedical Text with 2D and 3D Molecular Representations
Bioinformatics 2024
[Paper] [GitHub] -
(InstructMol) InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery
COLING 2025
[Paper] [GitHub] -
(3D-MoLM) Towards 3D Molecule-Text Interpretation in Language Models
ICLR 2024
[Paper] [GitHub]
- (GIT-Mol) GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text
Computers in Biology and Medicine 2024
[Paper] [GitHub]
-
(SMILES-BERT) SMILES-BERT: Large Scale Unsupervised Pre-training for Molecular Property Prediction
ACM BCB 2019
[Paper] [GitHub] -
(MAT) Molecule Attention Transformer
arXiv 2020
[Paper] [GitHub] -
(ChemBERTa) ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction
arXiv 2020
[Paper] [GitHub] [Model (125M)] -
(MolBERT) Molecular Representation Learning with Language Models and Domain-Relevant Auxiliary Tasks
arXiv 2020
[Paper] [GitHub] [Model (Base)] -
(rxnfp) Mapping the Space of Chemical Reactions using Attention-Based Neural Networks
Nature Machine Intelligence 2021
[Paper] [GitHub] [Model (Base)] -
(RXNMapper) Extraction of Organic Chemistry Grammar from Unsupervised Learning of Chemical Reactions
Science Advances 2021
[Paper] [GitHub] -
(MoLFormer) Large-Scale Chemical Language Representations Capture Molecular Structure and Properties
Nature Machine Intelligence 2022
[Paper] [GitHub] [Model (47M)] -
(Chemformer) Chemformer: A Pre-trained Transformer for Computational Chemistry
Machine Learning: Science and Technology 2022
[Paper] [GitHub] [Model (45M)] [Model (230M)] -
(R-MAT) Relative Molecule Self-Attention Transformer
Journal of Cheminformatics 2024
[Paper] [GitHub] -
(MolGPT) MolGPT: Molecular Generation using a Transformer-Decoder Model
Journal of Chemical Information and Modeling 2022
[Paper] [GitHub] -
(T5Chem) Unified Deep Learning Model for Multitask Reaction Predictions with Explanation
Journal of Chemical Information and Modeling 2022
[Paper] [GitHub] -
(ChemGPT) Neural Scaling of Deep Chemical Models
Nature Machine Intelligence 2023
[Paper] [Model (4.7M)] [Model (19M)] [Model (1.2B)] -
(Uni-Mol) Uni-Mol: A Universal 3D Molecular Representation Learning Framework
ICLR 2023
[Paper] [GitHub] -
(TransPolymer) TransPolymer: A Transformer-Based Language Model for Polymer Property Predictions
npj Computational Materials 2023
[Paper] [GitHub] -
(polyBERT) polyBERT: A Chemical Language Model to Enable Fully Machine-Driven Ultrafast Polymer Informatics
Nature Communications 2023
[Paper] [GitHub] [Model (86M)] -
(MFBERT) Large-Scale Distributed Training of Transformers for Chemical Fingerprinting
Journal of Chemical Information and Modeling 2022
[Paper] [GitHub] -
(SPMM) Bidirectional Generation of Structure and Properties Through a Single Molecular Foundation Model
Nature Communications 2024
[Paper] [GitHub] -
(BARTSmiles) BARTSmiles: Generative Masked Language Models for Molecular Representations
Journal of Chemical Information and Modeling 2024
[Paper] [GitHub] [Model (406M)] -
(MolGen) Domain-Agnostic Molecular Generation with Self-feedback
ICLR 2024
[Paper] [GitHub] [Model (406M, BART)] [Model (7B, LLaMA)] -
(SELFormer) SELFormer: Molecular Representation Learning via SELFIES Language Models
Machine Learning: Science and Technology 2023
[Paper] [GitHub] [Model (58M)] [Model (87M)] -
(PolyNC) PolyNC: A Natural and Chemical Language Model for the Prediction of Unified Polymer Properties
Chemical Science 2024
[Paper] [GitHub] [Model (220M)]