Main

In mental health, diagnosis and treatment depend nearly entirely on a syndromal classification that defines diagnostic categories based on symptom clusters1,2,3. However, comorbidity has long been recognized as the rule rather than the exception, with half of the patients that fulfill the diagnostic criteria for one disease also meeting the criteria for at least one other condition4,5.

Common etiologies and symptoms are observed in psychotic disorders and depression, which may indicate overarching, transdiagnostic pathogenetic mechanisms6,7. Hence, a broader approach that acknowledges the complexity of mental illnesses is essential to unravel the determinants of specific psychopathologies8,9,10. Such transdiagnostic approaches may be helpful in advancing therapeutic discovery by enabling symptoms to be investigated on multiple levels of analysis, for example, manifest psychiatric syndromes versus latent psychopathology dimensions. Potential advantages include the ability to classify and treat multiple psychological disorders using the same treatment protocol, as well as the capacity to better address comorbidities.

The interconnection of psychotic disorders and depressive syndromes showcases the above issue. Depressive symptoms can be observed in the prodromal stage before the first manifestation of psychotic symptoms, during the first or later psychotic episodes, after the fading out of an acute episode and as a sign of relapse8. Depression may result from the neurobiological processes of psychotic disorders9, as a side effect of antipsychotic drugs, as an outward manifestation of problematic symptoms or as a psychological response to psychosis as a significant life experience10. For an explanation of the connection between depression and psychosis, numerous models have been presented11,12. Despite the growing emphasis on transdiagnostic approaches to mental health, it is unclear if distinct diagnostic categories share a common neuronal base and our knowledge is based on meta-analyses of studies that were carried out on individual disorders using different methodologies13.

Structural magnetic resonance imaging (sMRI) studies have identified common brain abnormalities in patients with schizophrenia and depression (for example, in cerebral white matter14 or the cerebellum15,16). Volume deficits and structural brain abnormalities have been linked to both schizophrenia17,18 and depression19,20. These changes have also been observed in patients with recent-onset psychosis (ROP)17,21,22,23,24 and recent-onset depression (ROD)25,26,27,28, and have a wide distribution, affecting the frontal, temporal and parietal cortical regions as well as the subcortical, cerebellar, insula and callosal regions29,30,31. Moreover, abnormalities in the white matter (WM) of the brain and cerebrospinal fluid (CSF) have been reported in both psychotic and depressive disorders32,33,34,35. However, it is unclear how the observed brain changes translate to clinical symptoms and how different brain-change profiles correspond to different symptom profiles, that is, whether or not positive and negative symptoms, depressive symptoms and functionality are associated with different brain topologies.

Despite having information from structural brain imaging studies on psychosis that date back more than 40 years, so far it has not been possible to find a diagnostic or predictive biomarker for use in clinical settings. Small sample sizes, ambiguous biomarker definitions and a dearth of replications36 have been suggested to be responsible for these shortcomings. On the basis of brain volumetric changes, cross-sectional studies have reported different symptom clusters (positive and negative symptoms, depressive symptoms and functionality) that are associated with distinct neuroimaging profiles across both diagnoses25,37. In the present analysis, the contrast texture feature map expresses the gray-level differences of non-segmented magnetic resonance images, capturing inter-relationships of the segments and the cortical folding. Combining the captured information with an explainable artificial intelligence algorithm, which explains the classification decision, gives insights regarding the association of the pattern that the algorithm uses for classification with the clinical profile and outcome, grouped in homogeneous clusters of the symptom. The current study seeks (1) to investigate whether or not a transdiagnostic set of structural alterations can characterize ROP and ROD, and if any such alterations may reflect markers of psychiatric illness, and (2) to investigate associations between transdiagnostically individualized brain markers and clinical symptom severity as well as outcome profiles. We used a large training sample of ROP and ROD data and an independent validation sample, both of which are from the Personalized pROgNostic tools for early psychosIs mAnagement (PRONIA) study27. The measures of interest were radiomics texture features, based on previous findings by our group, where the contrast texture feature map has been identified as the key feature for psychosis14,37. Mathematically, the contrast feature expresses the shape abnormalities in the brain cortex and the surface relief of the cortical boundary. Finally, we performed explainable artificial intelligence analysis, a set of tools which supports the explanation of the classification decision, to highlight the brain regions that most contributed to differences between the ROD and ROP patients. We hypothesized that (1) ROP and ROD present shared contrast texture feature abnormalities and (2) different contrast feature patterns are associated with different transdiagnostic clinical profiles. The goal of our study was to investigate the usefulness of neuroimaging markers for the transdiagnostic prediction of outcome profiles. In this sense, our results serve mainly as a proof of concept that should be followed up in larger samples. We believe that there is merit in our results as they showcase the feasibility and potential relevance of transdiagnostic approaches.

Results

Sample characteristics

Sociodemographic and clinical characteristics of the three participant groups (ROP, ROD and healthy control (HC)) for the training sample are presented in Table 1. In the training sample, there were no statistically significant differences between ROP, ROD and HC with respect to age, sex and ethnicity. There were statistically significant differences between ROP and ROD with respect to specific psychopathological symptoms and psychosocial functioning (see Table 1). The power analysis on the training sample showed that 94 ROP + ROD and 78 HC is the minimum sample size that results in a comparative fit index greater than 0.9.

Table 1 Clinical and demographic sample details for the training sample at baseline and the independent validation sample

In the validation baseline sample (age- and sex-matched with the training sample—see ‘Training and validation samples’ in the Methods), there were no statistically significant differences between ROP, ROD and HC with respect to age, sex and ethnicity. There were statistically significant differences between ROP and ROD with respect to specific psychopathological symptoms and psychosocial functioning (see Table 1).

Classification results and localization

A repeated nested pooled cross-validation classifier of brain contrast texture maps achieved a balanced accuracy of 72.43%, a sensitivity of 74.40% and a specificity of 74.45% for discrimination of patients (pooled ROD and ROP) from the HC participants (Table 2). A total of 231 age- and sex-matched participants was included in the independent validation sample, with 139 participants classified correctly as ROD + ROP (balanced accuracy 60.27%, sensitivity 61.20% and specificity 58.65%). In the independent validation follow-up sample, 137 participants were included, where 77 participants were classified correctly and 60 of them provided all the clinical variables.

Table 2 Classification results

Implementation of the layer-wise relevance propagation (LRP) algorithm for multilayer neural networks, as described in Bach et al.38, showed that voxels with the highest contribution to the difference between patients and HC participants (that is, those with the highest relevance for the classification decision) were located in the cerebellum, anterior and posterior midline areas, frontotemporal areas and right insula (Fig. 1 and Supplementary Tables 5 and 6). In addition, we investigated cortical biomarkers for the general psychopathology in gray matter (GM), WM and CSF. The regions of interest were extracted using the AAL-VOIs atlas (https://doi.org/10.1006/nimg.2001.0978). We used the Johns Hopkins University (JHU) WM tractography atlas39 to identify the WM tractography identified by the LRP.

Fig. 1: Voxels with the red color contribute more to the classification decision.
figure 1

The relevance of the voxels calculated from the LRP are overlaid and rendered in a GM template using the MRIcron toolbox.

Brain relevance association with clinical outcome profile

The association of the brain relevance with clinical symptoms and outcome profiles was investigated in the validation sample. Clinical variables at the baseline were available for 131 patients from the validation sample (out of 139 classified correctly) and 71 patients of the follow-up sample. We support identifying patterns in the brain relevance heatmaps that were associated with the severity of clinical symptoms and outcome profiles. We applied the affinity propagation (AP) algorithm40 to cluster the brain relevance heatmaps of the validation sample blindly14,37, resulting in eight clusters. Inside each cluster, we calculated (1) the SANS factor scores (for the avolition, asociality, anhedonia, alogia and blunted affect subgroups), GAF and BDI total scores as well as the PANSS factor scores (for the positive, negative, distress, excitement and disorganization subgroups) at baseline (Fig. 2) and (2) the change scores of these variables at the one-year follow-up time (Fig. 3). The resulting clusters are presented in Figs. 2 and 3. We summarize the findings here on the basis of the symptom sum scores, as described above, and the outcome profile (details in Supplementary Table 7). Cluster 1 (number of subjects (N), N = 9 at baseline and N = 6 at follow-up) presented high BDI and PANSS_negative values, an improvement in PANSS_negative over time and a deterioration in PANSS_disorganization and the SANS scores related to alogia and blunted affect over time (Supplementary Fig. 3). In contrast to Cluster 1, Cluster 3 (N = 21 at baseline and N = 12 at follow-up) presented low BDI values and improvement in PANSS_disorganization as well as a high deterioration of the SANS_asociality and SANS_anhedonia scores and the BDI after 1 year. We speculate that the brain relevance in Cluster 3 has prognostic power for depression. Cluster 2 (N = 8 at baseline and N = 5 at follow-up) presented a high value for the GAF and low values for the PANSS_negative and SANS factor scores, with a marked improvement in the BDI and SANS_blunted affect and a deterioration in GAF, SANS_anhedonia and SANS_avolition over time. In contrast to Cluster 2, Cluster 4 (N = 12 at baseline and N = 6 at follow-up) presented low values in total GAF and high values in the SANS factor scores, with marked improvement in GAF, SANS_asociality and SANS_alogia over time and marked deterioration of the PANSS scores over time. We speculate that the negative symptoms in the baseline were a secondary effect of the primary positive symptoms. Cluster 6 (N = 31 at baseline and N = 17 at follow-up) and Cluster 7 (N = 6 at baseline and N = 5 at follow-up) captured older and younger ages, respectively. Cluster 7 presents a high improvement in the SANS_avolition and SANS_anhedonia scores. Finally, Cluster 5 (N = 22 at baseline and N = 9 at follow-up) presented low a SANS_blunted affect score and deterioration in the PANSS scores over time, and Cluster 8 (N = 22 at baseline and N = 11 at follow-up) showed a strong improvement in the PANSS_positive scores over time.

Fig. 2
figure 2

The average values of the clinical variables at baseline calculated in each cluster for the participants belonging to the independent validation sample. Different colors represent the different clusters.

Fig. 3
figure 3

The average values of the difference in symptoms in one year after the baseline calculated in each cluster. Different colors represent the different clusters.

In clusters with a sufficient number of participants (Clusters 3, 5, 6 and 8), we repeated regression analyses to predict clinical outcomes at baseline and changes in the scores between baseline and follow-up, using the mean relevance of the brain heatmap as a predictor (in Table 1, the mean and standard deviation scores of the ROP and ROD participants in the follow-up are presented). In Cluster 3, the mean relevance map predicted an improvement of the PANSS score factors (positive, disorganization, distress and excitement) and the SANS_anhedonia score with negative coefficients (Supplementary Table 8.1). In Cluster 8, the mean relevance map predicted an improvement of the PANSS_positive score factor with a positive coefficient (Supplementary Table 8.2). In addition, we tested the prediction values of the relevance of the voxels that contributed most to the classification decision. There were differential associations between voxel relevance and symptoms in the different clusters, especially for Clusters 3, 5, 6 and 8. In Cluster 3, the voxels in specific brain areas that are associated with low depressive symptoms (that is, fusiform, temporal lobe and insula from the left hemisphere; see Supplementary Table 7) predicted PANSS scores (for the excitement, disorganization and positive subgroups) and an improvement of functionality. In Cluster 5, the brain relevance predicts the improvement of functionality from the left temporal lobe and in the right-hemisphere calcarine, lingual gyrus and precuneus regions. In Cluster 6, the brain relevance associated with high age predicts the functionality at baseline from the left occipital and temporal lobes and the fusiform gyrus. In Cluster 8, the brain relevance predicts the improvement of functionality from the left temporal lobe.

Furthermore, we calculated the volumes of the regions with predicted value to check if the contribution of the voxels to the classification decision was driven by the volumetric changes. In all clusters with high numbers of participants (that is, Clusters 3, 5, 6 and 8), the brain volume in the detected regions was uncorrelated with the classification relevance of the respective regions, with the exception of the precentral and frontal lobe regions that are not contributing to the statistical models (Fig. 4).

Fig. 4: The correlation between the brain volume and the relevance of the voxels that contribute more to the classification decision in clusters that contain more than 20 participants.
figure 4

Different colors represent the different clusters. Brain regions labels extracted by the AAL-VOIs atlas. L, left; R, right; Sup, superior; Mid, middle; Inf, inferior; Oper, operculum; Tri, triangularis; Med, medial; Orb, orbital; Ant, anterior; Post, posterior; Crus1, Crus 1 lobule.

Discussion

In this study, across patients with ROD or ROP we identified eight clusters of brain heatmap patterns that showed differential associations with symptom clusters, functionality and age; that is, clinical heterogeneity in transdiagnostic psychopathology was reflected in homogeneous clusters of contrast texture brain changes. The brain regions contributing most to the classifier decision overlap with those implicated in psychotic and depressive disorders in studies that have assessed changes in the GM and WM of patients37. This fact and the proof-of-concept results of this study support our notion that neuroimaging biomarkers are useful for the prediction of transdiagnostic outcome profiles. We used the contrast feature map and the LRP method suggested by Bach et al.38 to train and explain a classifier for identification of the transdiagnostic psychopathology. The model showed good accuracy in classifying a combined group of ROD and ROP patients against HC participants in the training sample, which decreased in the external validation sample. However, the innovative finding of this study is the creation of homogeneous clusters that are associated with clinical severity and outcome profiles. The current analysis provides an example of how the proposed method can be applied to link the transdiagnostic symptoms (rather than categorical diagnoses) to brain structure. Our ultimate vision pertains to multidimensional prediction, investigating neuroanatomical markers of longitudinal changes in symptoms and functionality across diagnoses.

Changes in the hippocampus, insula and lateral prefrontal cortex have been observed in patients with ROD (for example, a decrease in volume in medial temporal regions)27 and ROP (for example, a volume deficit in the anterior hippocampus)41, which emphasizes the implication of brain alterations in the early stages of both illnesses. Using our method, GM analysis in transdiagnostic psychopathology showed brain changes in regions that have been reported in other studies, for example, the cerebellum20, the frontal lobe42, temporal regions43, the cingulum bundle44 and the precuneus45, rectus46, insula47, Heschl’s gyrus48, lingual49 and putamen left50 regions. The explained classification results using the contrast feature maps indicate the posterior cingulate cortex, gyrus rectus, third ventricle (consistent with our previous results in patients with schizophrenia and major depressive disorder14) and insular (consistent with previous study for clinical high risk who transitioned to psychosis and first-episode psychosis37,51) as key regions for the transdiagnostic classification of psychopathology (see Table 3 for further details).

In people with mental-health problems, there is mounting evidence of decreased WM function that results in problems of synchronization and connectivity52, with fractional anisotropy (FA) evaluated via diffusion tensor imaging being the most extensively used measure. Throughout the phases and progression of psychosis, neural alterations, particularly changes in WM connectivity, could be seen53. Consistent findings of significantly lower FA in the left genu of the corpus callosum, the left anterior corona radiata and the right superior longitudinal fasciculus have been reported in ROP patients relative to HC participants33,54,55. Functional dysconnectivity in the dorsal anterior cingulate cortex have been revealed in major depression56. Expansion of the third ventricle has been reported in the recent onset of psychosis57. Other studies have reported abnormalities in WM integrity as measured by FA in major depressive disorders34. The above findings are in line with our results regarding the WM changes in the frontal gyrus, gyrus rectus, dorsal anterior and corpus callosum. The third and fourth ventricles have been reported as regions that contribute to the identification of the transdiagnostic psychopathology using the contrast texture map.

The brain relevance calculated using the LRP algorithm resulted in clusters that showed differential associations with age, positive, negative and depressive symptom clusters as well as functionality. Previous studies that performed sMRI-based prediction models determined social functioning in the patients with ROD27. In some clusters, mean relevance values and/or the relevance of voxels contributing significantly to the classification decision accurately predicted the PANSS and SANS sum scores, functionality and change in functionality over time. Significant contributing voxels were located in regions that have previously been associated with these scores. In previous studies on schizophrenia and ROD, the FA of the temporal part and the temporolimbic GM was positively correlated with the GAF score, consistent with the findings in Cluster 6, which includes 17 ROP and 14 ROD patients at baseline27,58. A reduction of GM in the fusiform gyrus and temporal lobe were associated with PANSS scores in schizophrenia59, as we observed in Cluster 3, which includes mainly ROP patients. An improvement in functionality has also been associated with a increased GM volume in temporal lobes at baseline in a previous study60. Our proposed method yielded additional, new findings, which should be validated further in different datasets—for example, the right-hemisphere calcarine, lingual and precuneus regions for the prediction of functionality at follow-up, the fusiform gyrus and insula for the prediction of PANSS scores at baseline, and the occipital lobe and fusiform gyrus for the prediction of GAF scores at baseline.

The innovation of this study is the development and validation of individualized transdiagnostic models that give insights regarding the relation of brain changes to clinical symptoms and functional outcome. The identified clusters were derived using sMRI of recent samples and may eventually lead to the development of MRI-based predictive and decision-making tools. Given that the heterogeneity of mental disorders has hindered the search for biomarkers so far5,26,27, we created homogeneous clusters based on brain relevance patterns. We found groups with different brain alteration profiles, which corresponded to different transdiagnostic clinical profiles while also showing distinct association profiles between clinical symptom clusters and anatomy. Larger studies are warranted to investigate the optimal number of stable clusters; for the purposes of this study, it is important to show that MRI can potentially be used to form a clinical categorization into more homogeneous groups than those offered by categorical classification systems3,61,62. Distinct psychopathological profiles may be established to help distinguish patients with different syndromes, independent of other diagnostic considerations, and relate these to symptoms and clinical outcomes. Models that emphasize the role of changeable transdiagnostic illness mechanisms can help to further these efforts, promoting the generalizability of evidence-based treatments to routine care settings by accommodating comorbidities.

The extraction of texture feature maps from unsegmented brain images provides insight into the inter-relationships of different modality voxels. To our knowledge, non-segmented images have never been used to detect the examined disorders due to the lack of any interaction between GM, WM and CSF as an indicator of diagnosis, suggesting a novel biomarker14,37. The main advantages of the proposed method are the interpretability of the results and the use of non-segmented images, which eliminate segmentation errors. It considers that the changes in contrast reveal microstructure changes when calculated in small three-dimensional cubes, and the image intensities reveal inter-relations between GM, WM and CSF. The radiomic texture feature maps were extracted in MNI space (see Methods) from a registered masked T1-weighted image. The radius of the cubes and the topology of the neural network, for example, should be investigated further. For the replication of results, we used the same cube size and preprocessing methods as in previous papers14,37. Future studies should investigate a range of textural qualities, as each one reflects a different component of brain diversity. Finally, diverse methodological techniques should be examined further to acquire a better understanding of neurobiological variations across disorders and to make the results useful for targeted interventions and treatment alternatives. Future studies in larger samples should investigate whether or not several neuroimaging modalities may be combined and used for more accurate prediction, as well as an investigation of whether or not the regions involved represent functional networks.

Limitations

Currently, psychiatric diagnosis is not based on underlying brain structures or clear biological etiology. Patients whose symptoms may be caused by different biological processes may receive the same diagnosis, and patients whose symptoms may be caused by the same biological process may receive different diagnoses. This practice may adversely affect the development of outcome predictions and explain the limit in transferability of the results. Results must be interpreted with a clear understanding of any limitations, such as the cross-sectional and longitudinal nature of our data, which is important in interpretation because there may be dynamic and changing symptom cluster profiles that are not captured. The extraction of texture features is impacted by the variation in MRI intensity standardization. Another methodological drawback of the proposed method lies in the present lack of consensus regarding the applied image normalization method within the process for the extraction of texture features. It is important to note that, although the validation sample was recruited within the same study as the discovery sample, not all patients in the validation sample were recruited from the same sites as the discovery patients. Thus, environmental factors may have affected the accuracy prediction of our models. Because of this limitation, we emphasize that the present results represent a proof of concept, and that further validation in larger and geographically diverse cohorts is needed. Future studies should investigate thoroughly the stability and generalizability of the model as well as test the potential application of the model to clinical practice. Further investigation is needed on whether or not several neuroimaging modalities may be combined and used for more accurate prediction, as well as whether or not the regions involved represent functional networks.

Conclusions

The present analysis confirms that, at a symptom cluster level, the psychopathological profile of ROP and ROD follows similar patterns that can be clustered in homogeneous groups associated with specific brain alterations. We identified grouped transdiagnostic neuroanatomical signatures, which were strongly related with homogeneous clinical severity and outcome profiles. The present findings support a hypothesis of common psychopathological profiles of ROP and ROD and recognition of the outcome profile that could inform the development of novel targeted and repurposed therapies that target specific symptom clusters rather than specific diagnoses.

Methods

Before being included in the study, written informed consent was obtained from all participants and guardians (in the case of underage participants (defined as those under 18 years of age at all sites)). The trial was registered with the German Clinical Trials Register (DRKS00005042), and local research ethics boards in each location gave their approval (Ludwig-Maximilian University Munich (ethics ID: 351-13), University of Basel (ethics ID: M12/99), University of Cologne (ethics ID: 13-236), University of Turku (ethics ID: 99/1810/2013), University of Bari (ethics ID: 4754), University of Milan (ethics ID: N.PROT.0001885|P|GEN/02), University of Udine (ethics ID: 67172), University of Birmingham (ethics ID: 14/WM/0019), University of Münster (ethics ID: 2016-398-b-S) and University of Düsseldorf (ethics ID: 5957A)). Participants could claim transport costs. The STROBE (strengthening the reporting of observational studies in epidemiology) checklist is included in Supplementary Fig. 1.

Study design

Data were collected in the context of the PRONIA study, a seven-center study funded by the Seventh Framework Programme of the European Union aimed at optimizing candidate biomarkers for the prediction of mental disorders in early states of the disease. Details of the PRONIA study sites, recruitment protocol and quality-control procedures have been described in detail in previous publications26,27. We refer the reader to Koutsouleris et al.27 for a detailed description of the study design and methodology.

Participants

Data were collected following the standardized recruitment and assessment protocol from the PRONIA (PRONIA) study across seven European sites for the training sample (Ludwig-Maximilian University Munich (209 consenting participants), University of Basel (98 consenting participants), University of Cologne (143 consenting participants), University of Turku (93 consenting participants), University of Milan (42 consenting participants), University of Birmingham (84 consenting participants) and University of Udine (92 consenting participants) (Supplementary Table 1)) and across ten European sites for the validation sample (Ludwig-Maximilian University Munich (158 consenting participants), University of Basel (41 consenting participants), University of Cologne (65 consenting participants), University of Turku (67 consenting participants), University of Milan (47 consenting participants), University of Birmingham (40 consenting participants), University of Udine (51 consenting participants), University of Düsseldorf (31 consenting participants), University of Münster (45 consenting participants) and University of Bari (26 consenting participants)). Race and ethnicity were reported to the interviewer based on the individual’s identification. In both samples across diagnoses, white people comprised more than 77% of the total. In brief, participants were recruited through outpatient and inpatient services. The study participants analysed in the present study were recruited following a standardized recruitment and ascertainment protocol. The observational study protocol involved follow-up examinations every three months after the index ascertainment. Regular inter-rater reliability tests were performed to ensure reliability of the GAF scales and PANSS scores across the study sites. PRONIA investigators were repeatedly trained by one of the authors of the GAF and PANSS scales, A. Auther, who was independently testing the PRONIA consortium using four written transcripts of interviews from Zucker Hillside Hospital, USA. Thirty-six PRONIA raters performed an intraclass correlation (ICC) analysis to measure the between-rater agreement on the target measures. For each reliability test the raters had to generate six functioning scores: the current, lowest and highest in the past year for the social and role functioning domain (GAF:S/GAF:R) and for the PANSS scores. The ICC analysis results for the GAF scores are presented in Supplementary Table 2. Cicchetti63 gives the following guidelines for interpretation of the kappa or ICC inter-rater agreement measures: <0.40, poor; 0.40–0.59, fair; 0.60–0.74, good; 0.75–1.00, excellent. The psychopathology of each participant was assessed by trained clinicians. Calibration of the PANSS [(ICC) = 0.79] scores was performed by implementing inter-rater reliability tests across the study sites. The inter-rater reliability test for the SANS scores is not available64. The following were the general inclusion criteria: (1) in the age range of 15–40 years old, (2) sufficient language skills for participation and (3) the ability to provide informed consent. Current or previous head trauma with loss of consciousness (>5 min), current or past known neurological or somatic disorders that potentially affect the structure or functioning of the brain, current or past alcohol dependence, polysubstance dependence within the past six months and any medical indication against MRI were all general exclusion criteria. The following were all additional exclusion criteria for the HC participants: any current or previous Axis I or II disorder as defined in the Diagnostic and Statistical Manual of Mental Disorders 4th Edition (DSM-IV); a positive family history for affective or non-affective psychoses in first-degree relatives; and psychotropic medication intake more than five times per year and in the month before inclusion in the study.

ROP participants had to meet the same requirements as participants for a first episode of affective or non-affective psychotic episode, including the transition criteria established by Yung et al.65 and the structured clinical interview for DSM-IV-TR (SCID)66. They also had to be less than three months removed from the start of their initial antipsychotic medication treatment. The following specific ROP exclusion criteria comprised psychosis onset occurring more than 24 months ago and antipsychotic use lasting longer than 90 days (cumulatively in the past 24 months) with a daily dose rate at or above the ‘first-episode psychosis’ range minimum dosage of the German Society for Psychiatry, Psychotherapy and Nervous Diseases (DGPPN) S3 guidelines, equivalent to 5 mg olanzapine.

Patients with ROD were required to meet the SCID-established serious depression criteria that had been met within the previous three months. The specific ROD exclusion criteria were any DSM-IV-TR major depressive episode that preceded the current or recent episode, a duration of more than 24 months for the the current episode or the use of antipsychotic medication for more than 30 days, as described above.

Training and validation samples

The total training sample of 435 age- and sex-matched participants included 116 with ROD, 122 with ROP (minimally treated first-episode psychosis), and 197 HC, recruited during the first part of the recruitment period. All models were validated in the same participants at follow-up after 12 months, as well as in an independent validation sample (137 ROD, 162 ROP and 178 HC) recruited during the second part of the recruitment period. The follow-up sample consisted of 100 patients with ROP and 103 patients with ROD. Matching the validation sample to the discovery sample for sex and age resulted in a final independent validation sample of 94 ROD, 137 ROP and 159 HC participants, which was used in this study (details in Supplementary Fig. 1). The follow-up sample consisted of 81 patients with ROP and 48 patients with ROD. To evaluate the minimum sample size needed to have an experiment that is sensitive enough to detect the specified hypothetical effect size, we ran a power analysis.

Assessments

Data for the present analysis included the following: demographic and clinical data information (age, sex and medication exposure), Beck Depression Inventory-II (BDI-II)67, positive and negative symptom scale (PANSS)68, global assessment of functioning (GAF; global functioning-role scales)69 and the scale for the assessment of negative symptoms (SANS). We calculated five subscores for PANSS (positive, negative, disorganized, excitement and distress) based on the factor analysis by Wallwork and co-workers70. Several factor-analytic studies have suggested that a five-factor model better captures the PANSS structure in schizophrenia samples70. To verify that the consensus factor structure proposed by Wallwork et al. (2012)70, was suitable for our mixed sample of ROP and ROD patients, we performed confirmatory factor analysis. The results confirmed that the five PANSS factors had an acceptable fit to the data, which was reflected in a comparative fit index greater than 0.9 for all five factors (Supplementary Table 3). For the GAF, we used ratings based exclusively on the level of functioning; symptoms were not considered. The SANS factor scores provide a more detailed and differentiated picture of clinical symptoms, some of which are not included in the negative-factor PANSS, while others may have varying courses. To verify that the consensus factor structure proposed by Strauss et al. (2018)71 was suitable for our mixed sample of ROP and ROD patients, we performed confirmatory factor analysis. The results confirmed that the hierarchical model of the SANS factors had an acceptable fit to the data, which was reflected in a comparative fit index of greater than 0.9 for all four factors and shows significance for the anhedonia subgroup (Supplementary Table 4).

All participants underwent sMRI (at a field strength of 1.5 T (14 ROP, 7 ROD and 13 HC for the training sample and 12 ROP, 9 ROD and 9 HC for the validation sample), with the rest at a field strength of 3 T)25,26. For the current analysis, we used T1-weighted sMRI images of the ROD and ROP participants. In keeping with real-world scanner heterogeneity and as part of the larger PRONIA goals, the PRONIA sites were required (1) to acquire isotropic or nearly isotropic voxel sizes with a resolution of at least 1 mm, (2) to set the field-of-view parameters accordingly to ensure full three-dimensional coverage of the brain, including all parts of the cerebellum and (3) to define the relaxation time and echo time as well as other ionizing radiation parameters. At every site all images were equally distributed across the field strength, visually inspected, automatically defaced and anonymized using an in-house FreeSurfer-based script (https://surfer.nmr.mgh.harvard.edu) before the data was centralized. Supplementary Table 1 lists the scanner and parameter details of the structural magnetic resonance sequences used to examine the PRONIA sample participants. See Supplementary Section 1.1 and the previous PRONIA report26 for full MRI harmonization and data acquisition parameters.

Analysis

Preprocessing

All images were visually inspected, automatically defaced and anonymized using a Freesurfer-based script before data centralization. Subsequently, we used the open-source CAT12 toolbox (version r1155; CAT12), an extension of SPM12 (Statistical Parametric Mapping: The Analysis of Functional Brain Images, SPM12), to segment the images into GM, WM and CSF maps and to high-dimensionally register the segmented images to the stereotactic space of the Montreal Neurological Institute coordinates (MNI-152 space). The CAT12 toolbox was used with processing steps that consisted of spatial filtering, segmentation, segmentation estimation, a local adaptive segmentation step (which adjusts the images for WM homogeneities and varying GM intensities) and a high-dimensional Diffeomorphic Anatomical Registration Through Exponentiated Lie Algebra (DARTEL) registration of the image to an MNI template in the IXI database (IXI). The manual of the CAT12 toolbox details the processing steps applied to the structural images. The quality of the GM volume maps was checked using the established quality-assurance framework of the CAT12 toolbox (details in Supplementary Section 1.1)37. The images were inspected and removed if artifacts were identified during the quality-assurance framework of CAT12. The non-segmented magnetic resonance images, that is, wp0*images, were used for further analysis. We used histogram equalization to adjust the contrast of a grayscale image. Most pixel values of the original image present low contrast, and are in the middle of the intensity range. The histeq function in MATLAB R2020 produces an output image with pixel values evenly distributed throughout the grayscale range. The texture feature maps were then extracted from the transformed wp0* images (details in Supplementary Section 1.2 and Supplementary Fig. 2).

Feature extraction

We extracted texture feature maps from non-segmented images using the two-dimensional gray-level co-occurrence matrix (GLCM) calculated in each cube (N = 3,294 feature maps in total). Based on a previous study by Korda and colleagues37, we extracted the texture features ‘entropy’, ‘sum of entropy’, ‘difference of entropy’, ‘energy’, ‘contrast’ and ‘homogeneity’. We focused on the analysis in GLCM-contrast, which gives a low weight to elements with similar gray-level values and a high weight to elements with dissimilar gray levels, indicating large differences between neighboring voxels37. In our previous independent study, contrast feature maps were used for the identification of first-episode psychosis patients resulting in the highest accuracy in previous research37. Contrast captures high differences in gray levels in a small neighborhood. The highest difference is observed in the cortical boundary and at the borders between segments. Smaller differences observed in brain regions revealed the shape abnormalities that are not always related to volumetric changes. For this reason, we selected contrast feature maps for further investigation. The contrast of the gray-level pairs reflects intracortical myelin as has been investigated for patients with schizophrenia in low-sensory and motor areas72 (see Supplementary Fig. 3 for the whole preprocessing).

To test the transdiagnostically discrimination power of the ROP and ROD patients against the HC participants, the registered texture feature maps on the MNI space were fed into a 20 × 20 nested cross-validation deep-learning scheme for group classification (Supplementary Fig. 4). We focus the description of results on the contrast feature map, which gives a low weight to elements with similar gray-level values and a high weight to elements with dissimilar gray levels (that is, indicating large differences between neighboring voxels73).

Classification framework

As we were interested in transdiagnostic psychopathology, the ROP and ROD patients were pooled together for individualized brain texture analysis. We fed the registered contrast texture feature maps in a multilayer neural network in MATLAB R2020b to train and cross-validate a model to discriminate transdiagnostically the ROP and ROD patients from the HC participants (see Supplementary Fig. 4 for further details). Feature selection (two-sample t-test) in the inner cycle was cross-validated by selecting a number of features appropriate to the dimension of the database, namely, the top 549 ranked features that best discriminated the two classes74. A total of 549 features was selected using cross-validated feature selection to reduce dimensionality in the data. The number of features was chosen to balance the number of the features and the number of participants (curse of dimensionality).

Visualization and evaluation of heatmaps

For localization of the identified brain changes between pooled ROP and ROD versus HC participants, we calculated the relevance of the voxels in each class using the LRP algorithm for multilayer neural networks, as described by Bach and co-workers38. The output of the algorithm is an individual heatmap that represents common changes in brain structure for general psychopathology (see Appendix A in the Supplementary Information for more details).

The final images were visualized using the MRIcron toolbox (MRIcron; v1.020190902 for Windows). Visualizations of the classification results on the holdout dataset are presented in Fig. 1. The regions of interest were extracted using the AAL-VOIs atlas (AAL-VOIs). GM, WM and CSF were investigated for cortical biomarkers in psychosis. We used the JHU WM tractography atlas39 to identify the WM tractography identified by the LRP algorithm.

Associations of clusters with clinical variables

We implemented a clustering algorithm to demonstrate shared brain texture patterns across the participants. Our intention was to display the heatmap of each correctly classified participant in the independent validation sample. The independent validation sample was tested in the winner model of the training sample. The relevance of the voxels produced by the LRP algorithm was clustered in groups; subsequently, we associated the clusters with symptom severity and outcome profile75. The AP algorithm40 (Supplementary Section 1.4) was selected to cluster the participant’s relevance heatmap for identifying distinct patterns of the brain changes in the transdiagnostic psychopathology.

In addition, we performed simple and multiple regression analysis to investigate and model clinical symptoms and outcome profiles, inside the clusters, from (1) the mean brain relevance (heatmap) and (2) the contributed voxels, in the independent validation baseline and the follow-up holdout dataset.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.