Introduction

Understanding the structure-property-function relationships of materials is fundamental to the rational design of functional devices. For functional materials, diverse application scenarios such as displays, energy conversion, and sensing impose distinct property requirements from molecules to devices. This necessitates comprehensive, multi-scale, and multi-property characterization to guide material development. However, such characterization is inherently challenging and resource-intensive1, since conventional characterizations remain costly, labor-intensive, and reliant on multidisciplinary expertise, thereby creating major bottlenecks in the development of next-generation functional materials.

Organic functional materials, a burgeoning frontier in functional materials, have found extensive applications in diverse academic and commercial areas from light-emitting diodes2,3 and organic photovoltaics4,5,6 to chemical and biological sensors7,8,9,10. These materials typically comprise conjugated molecules and are utilized in the form of solid films as the core constituents of optoelectronic devices, where the macroscopic device performance arises from a complex interplay of molecular-level optoelectronic properties (e.g., emission wavelength, photoluminescence quantum yield), mesoscale charge transport behaviors (e.g., carrier mobility), and their structural organization in thin films. Accurate and efficient characterization of these multi-level properties is critical for materials innovation11, yet remains largely constrained by the high cost and complexity of current characterization approaches. This raises a central question: can we develop virtual characterization tools that provide accurate, scalable, and cost-effective access to various key material properties across multiple length scales, thereby substantially diminishing the reliance on experimental characterization during device development?

Quantum mechanics (QM) methods have long been used to evaluate optoelectronic properties and to model charge transport, typically through the calculation of intermolecular electronic couplings (transfer integrals) combined with multi-scale simulation12,13,14. However, these approaches are computationally expensive, often scaling cubically with system size and requiring extensive sampling of microscopic configurations, which limits their practicality in large-scale materials discovery. On the other hand, data-driven approaches have exhibited great potential in predicting characterized properties with high accuracy only based on microscopic representation of materials15,16,17,18,19,20,21,22. Recent 2D graph convolutional networks have attained state-of-the-art (SOTA) accuracy in multiple optoelectronic property predictions at the molecular level23,24,25. Nevertheless, these approaches are inherently limited by their inability to incorporate essential 3D structural information, which is critical for modeling transport-related processes. Alternatively, strategies such as the Coulomb matrix representation26 or 3D graph networks27 have been employed to model transfer integrals, which are further used as the input for kinetic Monte Carlo (kMC) simulations to estimate mesoscale carrier mobility27,28,29,30. However, there remains a lack of transferable models for accurately predicting transfer integrals in thin films and subsequently estimating film mobility. Moreover, device-level performance predictions continue to rely heavily on hand-crafted descriptors derived from computationally intensive DFT or TDDFT calculations15,22. To date, no existing framework has simultaneously achieved high accuracy, efficiency, and transferability across molecular-, mesoscopic-, and device-scale virtual characterizations — leaving a long-standing gap that continues to hinder the rational design of organic functional devices.

To address this challenge, we propose OCNet, a domain-knowledge-enhanced representation learning framework for organic conjugated systems that, for the first time, enables unified and accurate virtual characterization of organic functional materials—from molecular-scale optoelectronic properties and mesoscale charge transport to device-level performance. Specifically, OCNet realize the first deep-learning-derived molecular and bimolecular(intermolecular) representations for organic functional materials. Leveraging self-constructed databases of over ten million conjugated molecules and dimers, together with the pre-training strategy adopted in previous data-rich scenarios31,32,33,34,35,36, OCNet captures generalizable 3D features that are comparable to domain-expert feature engineering in describing intramolecular optoelectronic properties and intermolecular electronic coupling. As a result, it outperforms reported SOTA models by over 20% in predicting various key computed or experimental optoelectronic properties and intermolecular transfer integrals. Subsequently, using a self-constructed million-scale database of transfer integrals at the DFT level, OCNet realizes the first transferable model for predicting transfer integrals in thin films, enabling accurate prediction of mesoscale carrier mobility through multi-scale simulation. Finally, by integrating tight-binding-level electronic descriptors with our microscopic representation, OCNet achieves accurate, near real-time prediction of device PCE, surpassing TDDFT-descriptor-based models by 12%. This bridges the longstanding gap between molecular design and device-level optimization. Overall, OCNet offers a unified and scalable foundation for multi-property and multi-scale virtual characterization in organic electronics. We anticipate this framework will broadly accelerate the discovery and development of organic materials for energy, display, and sensing applications.

Results

Overview of OCNet framework

Our OCNet framework (Fig. 1) employs a pre-trained Transformer architecture based on 3D geometries to extract microscopic information of organic conjugated systems. It establishes general molecular and bimolecular representations that capture microscopic optoelectronic and charge transport behaviors, including intramolecular electronic excitations and intermolecular electron hopping. At the molecular level, OCNet directly maps microscopic representations to molecular optoelectronic properties or intermolecular transfer integrals. At the mesoscopic and macroscopic scales, it connects microscopic representations to higher-level properties through either physics-driven multi-scale modeling or end-to-end data-driven pipelines. Moreover, for material properties governed by complex physical mechanisms, particularly device-level performance, OCNet supports the incorporation of expert-derived features such as electronic structure information to further enhance the expressiveness of its microscopic representations. In addition, to overcome the scarcity of large-scale databases required for effective pre-training, we construct the first 10-million-scale conjugated molecular and bimolecular databases including geometries and corresponding optoelectronic properties or transfer integrals at the tight- binding (TB) level (Fig. 1a), enabling OCNet to learn more comprehensive and transferable microscopic 3D representations.

Fig. 1: Overview of the OCNet framework.
figure 1

a Construction of two large-scale pre-training databases. The molecular database is built by mining open-source databases and generating new molecules via ring fusion and fragment assembly. The bimolecular database is created from 100K molecular films obtained through MD simulations. b Development of general molecular and bimolecular representations for conjugated systems. c Integration of domain-specific features into microscopic representations. d Multi-scale Virtual Characterization using OCNet, spanning from molecular to device level.

Pre-training Database

For the molecular dataset, we incorporate 15 elements (H, B, C, N, O, F, Si, P, S, Cl, Br, I, Ir, Ge, Se) and cover three major classes of conjugated molecules: metal-organic complexes, fused-ring structures, and fragment-assembled conjugated systems, thus spanning a broad and representative chemical space. Specifically, we integrate 0.84 million Ir complex structures from a recent open-source dataset37 and 0.5 million fused-ring systems from COMPAS-2x38. Additionally, we generate 14 million molecular structures using ring fusion and fragment assembly methods (detailed in the supporting information, Figs. S1 and S2). In our ring fusion protocol, we allow carbon or heteroatoms to be shared by two or three rings (Fig. S1b), resulting in molecules with multiple resonance forms—an essential feature for optoelectronic applications such as display materials39,40. Fragment assembly further extends chemical diversity by linking conjugated fragments via carbon-carbon connections. We compare the chemical diversity of our molecular database with the open-source COMPAS-2x dataset by analyzing the distributions of heavy atom count (Fig. 2a, b) and molecular weight (Fig. S3). Most molecules in COMPAS-2x contain fewer than 50 heavy atoms and have molecular weights below 600 Da, whereas approximately 60% of the molecules in our database exceed these thresholds, indicating the inclusion of larger and more complex structures. We further benchmark the chemical space coverage of our database against COMPAS-2x and the largest open-source conjugated fragment assembly dataset, FORMED41, using t-SNE visualization of our molecular representations (Fig. 2c). The results indicate that COMPAS-2x and FORMED occupy only limited regions of the projected space, while our dataset spans a broader and more diverse range, underscoring its comprehensive coverage of conjugated chemical space.

Fig. 2: Distribution and visualization of molecular datasets.
figure 2

a Heavy atom distribution in our molecular database. b Heavy atom distribution in the COMPAS-2x database38. c T-SNE visualization of molecular representations from our database, COMPAS-2x, and FORMED41.

For the bimolecular database, we sample 9.5M dimer conformations from 100K molecular films, that represents the first large-scale bimolecular database derived from thin-film environments. To construct this database, we first select molecules from our molecular database that exhibit low electron or hole reorganization energies at the GFN2-xTB level. These selected molecules are then assembled into amorphous films via MD simulations, using our previously developed GAFF-compatible force field specifically tailored for organic conjugated systems42 (see Supporting Information for details). We further demonstrate the chemical diversity of the bimolecular database by analyzing the distributions of heavy atom counts and molecular weights (Fig. S4). Our database includes dimers with up to 350 heavy atoms and molecular weight exceeding 9000 Da, indicating its extensive structural and chemical complexity.

Domain-knowledge-enhanced Microscopic Representations for Conjugated Systems

We then leverage the self-constructed molecular and bimolecular databases to pre-train the SE(3) Transformer architecture34(Fig. 1b). In the first stage, OCNet is pretrained to recover atomic positions of molecular and bimolecular structures using an SE(3)-equivariant head. In the second stage, we re-pretrain the model to predict optoelectronic properties and intermolecular transfer integrals at the tight-binding (TB) level43 (see Supporting Information and Fig. S5). These two-stage pre-training enables OCNet to acquire rich structural and physical knowledge, resulting in a general microscopic representation that matches or even surpasses expert-designed features in downstream tasks.

To further enhance OCNet’s capability in modeling complex physical quantities, especially device-level performance, we incorporate domain knowledge by fusing our deep-learning-derived representations with expert features (e.g., TB-level electronic structure descriptors) using multilayer perceptrons (Fig. 1c, see Methodology for details). This hybrid strategy establishes OCNet as a state-of-the-art framework for virtual characterization across a wide range of organic functional materials. In the following sections, we systematically evaluate OCNet on multiple representative tasks (Fig. 1d), including molecular-level optoelectronic property prediction, mesoscopic charge transport estimation, and macroscopic device performance (PCE) modeling, to demonstrate its universality, accuracy, and efficiency. Unless otherwise specified, we adopt an 8:2 training-to-test split and report model performance using the mean absolute error (MAE) and the coefficient of determination (R2).

Molecular-level optoelectronic property prediction

We first evaluate OCNet’s performance on predicting computed optoelectronic properties using the largest open-source dataset: OCELOT chromophores24,44. Specifically, we focus on four molecular properties that are directly relevant to downstream device design: the HOMO-LUMO gap (H-L), the lowest singlet excitation energy (S0-S1), and electron and hole reorganization energies (ER and HR). To assess OCNet’s effectiveness, we define an accuracy score as the ratio between the MAE of the reported state-of-the-art (SOTA) model and that of OCNet. Across all four properties, OCNet achieves the highest accuracy, outperforming existing methods by at least 13%, and achieves up to 60% improvement in HR prediction (Fig. 3a). OCNet’s predictions show strong agreement with quantum mechanical results (Fig. 3d), with MAEs of 0.199 eV and 0.008 eV, and R2 values of 0.803 and 0.987 for S0-S1 and H-L, respectively. For ER and HR, OCNet reaches semi-quantitative accuracy (MAEs of 0.082 eV and 0.087 eV; R2 values of 0.575 and 0.511), which is sufficient for screening low-reorganization-energy candidates.

Fig. 3: Performance of OCNet on molecular-level optoelectronic property prediction.
figure 3

a Accuracy score comparing OCNet (red) with reported SOTA models (blue) for predicting computed properties. b Accuracy score comparing OCNet (red) with reported SOTA models (blue) for predicting experimental properties. c MAEs of various models, including OCNet (w/ and w/o pre-training), Uni-Mol, and reported SOTA NN, across multiple properties. d Correlation between OCNet predictions and QM-calculated properties(ER, s0-s1, H-L and HR from left to right). e Correlation between OCNet predictions and experimental properties (PLQY, FWHM, Abs., and Emi. from left to right).

We further compare OCNet (with(w/) and without(w/o) pre-training) to reported SOTA models, in terms of MAE and R2 across all four properties (Tables S1 and S2). OCNet w/ pre-training exhibits significantly superior performance over other models in these four opto-electronic properties. For instance, in S0-S1 prediction, it achieves a MAE of 0.199 eV and R2 of 0.803, significantly better than both OCNet w/o pretraining (MAE: 0.318 eV; R2: 0.544) and the reported SOTA (MAE: 0.249 eV; R2: 0.76).

Next, we evaluate OCNet’s performance on Deep4Chem45, the largest open-source dataset of experimental optoelectronic properties. To account for solvent effects, we construct a unified representation for solute-solvent systems by concatenating the element and distance matrices of both components (Fig. S6, detailed in “Methods”). Additionally, we integrate domain-features defined in SuboptGraph25 into OCNet’s molecular representation to further enhance its expression capability. We benchmark OCNet against the reported SOTA model on four real-scenario related optoelectronic properties: absorption wavelength (Abs.), emission wavelength (Emi.), photoluminescence quantum yield (PLQY), and full width at half maximum (FWHM) (Fig. 3b). OCNet outperforms the SOTA model across all four tasks, achieving 18% and 13% accuracy improvements in Abs. and Emi. predictions, respectively. While improvements for PLQY and FWHM are more modest (5%), this is expected given that these properties were not included in the pre-training stage. Correlation analysis further confirms OCNet’s strong predictive performance (Fig. 3e). For Abs. and Emi., the model achieves MAEs of 7.085 nm and 11.167 nm, with corresponding R2 values of 0.982 and 0.949. For PLQY and FWHM, OCNet attains MAEs of 0.101 and 9.123 nm, and R2 values of 0.722 and 0.719, respectively. These results are sufficient for screening candidates with desired light color, high quantum yield, and narrow emission bandwidths in future applications.

To evaluate the contributions of pre-training and the domain-features, we compare OCNet’s performance (w/ and w/o domain features) against Uni-Mol (a general-purpose molecular representation model for drug discovery) and the reported SOTA neural network for Abs. and Emi. predictions (Fig. 3c; Tables S3 and S4). Uni-Mol exhibits significantly lower accuracy in this context, with a MAE of 16 nm for Emi, due to its lack of pre-training on large-scale conjugated molecular database. In contrast, both OCNet w/ and w/o domain features outperform the SOTA baseline, indicating OCNet’s strong expression capability in experimental property prediction at the molecular scale. In addition, the integration of domain features yields only marginal improvements over OCNet w/o domain features, suggesting that for optoelectronic properties governed by relatively simple physical processes, deep-learning-derived representations are already sufficiently expressive.

Overall, all results validate the necessity of pre-training on large-scale conjugated molecular databases and the advantage of 3D deep learning over 2D graph-based approaches in predicting optoelectronic properties, demonstrating great potential for efficient, property-driven materials design.

Intermolecular charge transfer integrals prediction

We next evaluate OCNet’s performance on intermolecular electronic coupling (transfer integral) prediction, a key microscopic property that directly governs charge transport in organic semiconductors. To enhance the model’s geometric expressiveness, we incorporate physically meaningful structural descriptors inspired by Valeev et al.46, including: centroid-to-centroid distance, the angle between molecular plane normals, and the angle between the centroid vector and each molecular plane normal. For benchmarking, we adopt the OCELOT dimer dataset27,44, containing 438,000 DFT-calculated transfer integrals across approximately 25,000 molecular crystal structures. OCNet accurately predicts both HOMO-HOMO (H-H) and LUMO-LUMO (L-L) transfer integrals, achieving MAEs of 2.131 meV and 2.242 meV, and R2 values of 0.909 for both cases (Fig. 4a). Compared to the reported SOTA model, OCNet demonstrates a 50% improvement in prediction accuracy for crystal transfer integrals (Fig. 4c; Tables S5 and S6), highlighting the effectiveness of its bimolecular representation.

Fig. 4: Performance of OCNet on charge transfer integrals prediction.
figure 4

a Correlationv between OCNet predictions and QM-calculated H-H (left) and L-L (right) TI. in the crystal environment. b Correlation between OCNet predictions and QM-calculated H-H (left) and L-L (right) TI. in the film environment. c Accuracy score comparing OCNet (red) with reported SOTA models (blue) for predicting transfer integrals. d MAEs of various models, including OCNet(w/ and w/o pre-training) for transfer integral prediction.

To further assess the contribution of pretraining, we compare the performance of OCNet (w/ and w/o pre-training), alongside the reported SOTA model, on the OCELOT dataset (Fig. 4d). Without pre-training, OCNet’s accuracy declines markedly, with MAEs of 4.100 meV (H-H) and 3.300 meV (L-L), substantially higher than the OCNet w/ pre-training (2.131 meV) and even the reported SOTA baseline (3.000 meV). These findings emphasize the importance of both the large-scale bimolecular database and the pretraining strategy in acquiring a expressive and transferable representation for modeling intermolecular electronic couplings.

Mesoscopic-level charge transport prediction

At the mesoscopic level, carrier mobility in thin film serves as a crucial parameter for evaluating charge transport efficiency in organic electronic devices14,47. However, its accurate estimation via multi-scale simulations remains a key challenge, primarily due to the reliance on transfer integrals derived from costly DFT calculations. To address this, we develop the first transferable model for predicting transfer integrals in disordered thin-film environments. We construct a large-scale DFT-level database comprising 1.8 million dimers extracted from 45,000 distinct molecular films (details in the Supporting Information). To capture the complexity of film environments, we enhance OCNet’s bimolecular representation by integrating both structural features and domain-specific, TB-level electronic descriptors—including overlap integrals, orbital-specific and total effective transfer integrals. Since no prior models exist for this task, OCNet’s performance is benchmarked with an assigned accuracy score of 1.0 (Fig. 4c). We further evaluate the correlation of H-H and L-L transfer integrals(TI.) between OCNet predicted and QM calculated values (Fig. 4b). OCNet demonstrates high accuracy, with R2 values of 0.844 and 0.872, and MAEs of 7.350 meV and 7.497 meV for H-H and L-L TI., respectively, indicating sufficient precision to support subsequent mobility evaluations through further multi-scale modeling.

We then randomly select 80 molecules from our molecular database and generate their thin-film structures via molecular dynamics simulations at 300 K. For each film, we evaluate all transfer integrals of dimers within a 10 Å center-of-mass distance using both OCNet and PW91/6-31G(d) methods. Reorganization energies of single molecules are also obtained at the same DFT level. These parameters are then fed into kinetic Monte Carlo (kMC) simulations to estimate charge carrier mobilities. We compare the electron mobilities of seven representative thin films based on transfer integrals from DFT, OCNet, and GFN1-xTB (Fig. 5a). The mobilities obtained using OCNet closely match those derived from DFT, whereas the GFN1-xTB-based values are significantly underestimated. Furthermore, we compare the correlation between logarithmic mobilities (log(μ)) predicted by OCNet and those calculated using DFT(Fig. 5b). The results show that the log(μ) predicted by OCNet is comparable with the DFT-calculated values, with a MAE of 0.291 and R2 and R of 0.713 and 0.939, respectively. Through physic-driven multi-scale modeling, OCNet bridges microscopic representation with mesoscopic charge transport properties, achieving a favorable balance between the accuracy and efficiency for mobility evaluation. This establishes a foundation for high-throughput virtual screening of high-mobility organic semiconductors, with great potential to address the longstanding bottleneck in the discovery of efficient organic electron transport materials.

Fig. 5: Performance of OCNet in predicting the mesoscopic carrier mobility and device-level PCE.
figure 5

a Comparison of charge mobility (μ) calculated by DFT, OCNet, and GFN1-xTB across seven molecular films. b Correlation between OCNet and DFT predictions for \(\log (\mu )\). c Correlation between OCNet-predicted and experimentally measured PCE.

Device-level performance prediction

Although device-level performance such as PCE arises from complex physical processes and depends on the collective optoelectronic and transport properties of multiple functional layers, it is fundamentally governed by the microscopic behavior of electrons. Previous efforts, such as the work by Sahu et al.15, have explored end-to-end data-driven pipelines that link microscopic electronic structure descriptors to device-level PCE. However, their approaches rely on computationally expensive TDDFT-derived features, which limit scalability for high-throughput material screening.

In principle, by leveraging its expressive microscopic representations, OCNet may achieve accurate device performance prediction either directly based on 3D structural information or in combination with low-cost, approximate TB-level descriptors. To evaluate this, we adopt the OPV-PCE dataset created by Sahu et al. as a benchmark. To maintain consistency with Sahu’s study, partition the dataset into 250 molecules for training and validation and 30 for testing. OCNet with TB-level descriptors achieves a test-set MAE of 0.738% in predicting PCE (Fig. 5c), demonstrating that OCNet can reliably reproduce experimental PCE values with both high accuracy and high efficiency. Furthermore, OCNet attains a R2 of 0.696 and a Pearson correlation coefficient R of 0.841, representing a significant improvement over Sahu et al.’s previous result (R = 0.79). We also evaluate the performance of OCNet w/o TB-level descriptors in predicting PCE (Fig. S7), which still surpasses Sahu et al.’s results, with a MAE of 0.756%, R2 = 0.657, and R = 0.817. These findings indicate that both 3D structural information and electronic information contribute to enhancing PCE prediction accuracy, and their integration provides a more precise molecular representation for device modeling. We also compare the computational efficiency of OCNet. The generation of TB-level descriptors requires approximately 0.08 CPU hours on an Intel Xeon Platinum 8163 2.5 GHz processor, while inference with OCNet takes only 0.005 seconds on an NVIDIA 4090 GPU. In contrast, TDDFT-derived descriptors typically demand over 1000 CPU hours. Thus, OCNet enables near real-time predictions, combining high accuracy with exceptional computational efficiency. By bridging this critical gap, we believe OCNet offers a promising pathway toward the end-to-end design of high-performance organic electronic devices.

Discussion

In summary, we present OCNet, a domain-knowledge-enhanced representation learning framework for organic conjugated systems that, for the first time, enables multi-scale virtual characterization, spanning from molecular properties to mesoscale film behavior and macroscopic device performance. To achieve this, we construct the first deep-learning-derived molecular and bimolecular representations for organic functional materials. Leveraging a self-generated database of over ten million conjugated molecules and dimers and pre-training strategy, OCNet learns generalizable 3D features comparable to domain-expert-crafted descriptors for modeling intramolecular and intermolecular electronic behaviors. As a result, it outperforms reported SOTA models by over 20% in predicting various key computed or experimental optoelectronic properties and intermolecular transfer integrals. Furthermore, trained on a self-constructed million-scale transfer integrals database at the DFT level, OCNet provides the first transferable model for predicting thin-film transfer integrals, enabling accurate mesoscale carrier mobility estimation through multiscale simulations. At the device level, by integrating tight-binding-level electronic descriptors with our microscopic representation, OCNet first achieves near real-time prediction of PCE with high accuracy, surpassing TDDFT-descriptor-based models by 12%. Taken together, OCNet offers a unified and scalable tool for accurate virtual characterization of various key material properties across multiple length scales, significantly reducing the reliance on resource-intensive characterization to establish structure-property-function relationships, thus, with broad applicability in accelerating materials design in photovoltaics, displays, and sensing.

However, in this work we have not yet employed OCNet to design new molecules and validate their performance through wet-lab experiments. To further advance OCNet’s capabilities, we aim to integrate OCNet-based virtual characterization with high-throughput experiments in future studies. This closed-loop research paradigm will extend OCNet’s utility in data-scarce scenarios, ultimately enabling fully intelligent design of organic materials.

Methods

Architecture of microscopic representation

To construct general and transferable molecular and bimolecular representations of organic functional materials, we first encode atomic numbers and pairwise distances to capture both atomic and 3D spatial information. We then use the self-attention mechanism in the Transformer architecture to update and couple these representations, enabling the model to capture complex interactions within molecules or bimolecules. Similar to the CLS token in BERT33, which aggregates sequence-level representations for 1D tasks, we select the geometric center of the molecule or bimolecule as the CLS atom to aggregate atomic features. This method reflects the overall structural characteristics. The initial atomic representation is given by:

$${{\bf{x}}}^{0}={[{\rm{emb}}({\rm{CLS}}),{\rm{emb}}({Z}_{0})\ldots {\rm{emb}}({Z}_{n}),{\rm{emb}}({\rm{PAD}}),\ldots {\rm{emb}}({\rm{PAD}})]}_{{n}_{\max }+1}$$
(1)

where Zi represents the vocabulary index of the i-th atom in the molecule or bimolecule. All atoms within the molecule or bimolecule are encoded using an embedding layer according to their elements, while the first element in Eq. (1) represents the embedding layer for the CLS atom. \({n}_{\max }\) refers to the maximum number of atoms in a molecule or bimolecule within the database. We use a PAD token to ensure a fixed input size when the number of atoms is less than \({n}_{\max }\).

The initial pair representation is the molecular or bimolecular distance kernel matrix P0, where Pij = σ(aijDij + bij), with aij and bij determined by the elemental types of atoms i and j. The L2 distance matrix D is given by:

$${\bf{D}}=\left(\begin{array}{llllll}{r}_{{\rm{CLS}},{\rm{CLS}}}&{r}_{{\rm{CLS}},1}&\cdots \,&{r}_{{\rm{CLS}},n}&\cdots \,&0\\ {r}_{1,{\rm{CLS}}}&{r}_{1,1}&\cdots \,&{r}_{1,n}&\cdots \,&0\\ \vdots &\vdots &\ddots &\vdots &\cdots \\ {r}_{n,{\rm{CLS}}}&{r}_{d,1}&\cdots \,&{r}_{n,n}&\cdots \,&0\end{array}\right){\rm{n}}_{\max }+1,{\rm{n}}_{\max }$$
(2)

For systems in solution, we concatenate the initial atomic representations \({{\bf{x}}}_{{\rm{solu}}}^{0}\) and \({{\bf{x}}}_{{\rm{solu}}}^{0}\) of the solute and solvent molecules, along with the initial pair representations \({{\bf{P}}}_{{\rm{solu}}}^{0}\) and \({{\bf{P}}}_{{\rm{solu}}}^{0}\) (Figure S5), to form the initial atomic representations:

$$\begin{array}{ll}{{\bf{x}}}^{0}=\left[{\rm{emb}}({\rm{CLS}}),{\rm{emb}}({Z}_{0}^{{\rm{solu}}}),\ldots ,{\rm{emb}}({Z}_{m}^{{\rm{solu}}}),{\rm{emb}}({Z}_{0}^{{\rm{solv}}}),\right.\\\qquad\left.\ldots ,{\rm{emb}}({Z}_{n}^{{\rm{solv}}})\right]_{{n}_{\max }^{{\rm{solu}}}+{n}_{\max }^{{\rm{solv}}}+1}\end{array}$$
(3)

where Z denotes the vocabulary index of the i-th atom in the molecules, and \({n}_{\max }^{{\rm{solu}}}\) and \({n}_{\max }^{{\rm{solv}}}\) represent the maximum number of atoms in the solute molecule and solvent molecule. Padding may be applied when the number of atoms in a given system is smaller than M.

The initial pair representations are:

$${{\bf{P}}}^{0}=\left(\begin{array}{ll}{{\bf{P}}}_{{\rm{solu}}}^{0}&0\\ 0&{{\bf{P}}}_{{\rm{solv}}}^{0}\end{array}\right)$$
(4)

The first element in \({{\bf{x}}}_{0}^{0}\) denotes the initial whole representation of the gas molecule or the solute and solvent molecules.

Based on the initial atomic and pair representations x0 and P0, we update the atomic and pair representations with 15 encoder layers (Figure S6). For the l-th layer, we compute the the query, value, and key matrices as \({{\bf{Q}}}^{l}={{\bf{x}}}^{l-1}{{\bf{W}}}_{Q}^{l}\), \({{\bf{V}}}^{l}={{\bf{x}}}^{l-1}{{\bf{W}}}_{V}^{l}\) and \({{\bf{K}}}^{l}={{\bf{x}}}^{l-1}{{\bf{W}}}_{K}^{l}\). By aggregating the atomic and pair representations from the l − 1th layer, the atomic and pair representations for the l-th layer are updated as:

$${{\bf{x}}}^{l}={{\bf{x}}}^{l-1}+({\rm{softmax}}\left(\frac{{{\bf{Q}}}^{l}{{{\bf{K}}}^{l}}^{T}}{\sqrt{{d}_{k}}}+{{\bf{P}}}^{l-1}\right){{\bf{V}}}^{l}){{\bf{W}}}_{O}^{l}$$
(5)
$${{\bf{x}}}^{l}={{\bf{x}}}^{l}+{\rm{MLP}}({x}^{l})$$
(6)
$${{\bf{P}}}^{l}={{\bf{P}}}^{l-1}+\frac{{{\bf{Q}}}^{l}{{{\bf{K}}}^{l}}^{T}}{\sqrt{{d}_{k}}}$$
(7)

The MLP represents the multilayer perceptron. We use \({{\bf{W}}}_{O}\in {{\mathbb{R}}}^{{d}_{v}\times {d}_{{\rm{model}}}}\) to project the output of the attention mechanism to the same dimension as xl.

After processing through Lth encoder layers and MLP in our MRL model, the initial feature of the CLS atom \({{\bf{x}}}_{0}^{0}\) aggregates the features of atomic and pair representations within the molecule or bimolecule. We denote \({{\bf{x}}}_{0}^{L}\) as CLSrepr, which serves as the overall representation of the molecule or biomolecule. We pre-train the model on large-scale 3D geometries and TB-level properties to improve the expressiveness of CLSrepr. The pre-training include masked atom prediction and 3D coordinate reconstruction using SE(3)-equivariant networks at the first stage and then pre-trained on a large-scale optoelectronic or transfer integral database.

For downstream property prediction, we use the following model:

$$y={\rm{MLP}}({{\rm{CLS}}}_{{\rm{repr}}})$$
(8)

Alternatively, we can integrate domain-specific features (Fea) with the microscopic representation:

$$y={\rm{MLP}}\left({\rm{concat}}({\rm{MLP}}({{\rm{CLS}}}_{{\rm{repr}}}),{\rm{MLP}}({\rm{Fea}}))\right.$$
(9)

Model configuration and training

We construct molecular and bimolecular representations with 15 layers and an embedding dimension of 512, using a Gaussian kernel size of 128. Pre-training is performed on 8 Tesla A100 GPUs, taking approximately 20 days to complete. We use the Adam optimizer with a learning rate of 0.001, gradient clipping set to 1.0, and 8 million training steps with 20K warm-up steps. The batch size is 128, and training lasts for 1000 epochs. Hyperparameters for training downstream organic optoelectronic properties and transfer integrals are detailed in Table S7.