Multimodal fusion has been regarded as a promising tool to discover covarying patterns of multiple imaging types impaired in brain diseases, such as schizophrenia (SZ). In this article, we aim to investigate the covarying abnormalities underlying SZ in a large Chinese Han population (307 SZs, 298 healthy controls [HCs]). Four types of magnetic resonance imaging (MRI) features, including regional homogeneity (ReHo) from resting-state functional MRI, gray matter volume (GM) from structural MRI, fractional anisotropy (FA) from diffusion MRI, and functional network connectivity (FNC) resulted from group independent component analysis, were jointly analyzed by a data-driven multivariate fusion method. Results suggest that a widely distributed network disruption appears in SZ patients, with synchronous changes in both functional and structural regions, especially the basal ganglia network, salience network (SAN), and the frontoparietal network. Such a multimodal coalteration was also replicated in another independent Chinese sample (40 SZs, 66 HCs). Our results on auditory verbal hallucination (AVH) also provide evidence for the hypothesis that prefrontal hypoactivation and temporal hyperactivation in SZ may lead to failure of executive control and inhibition, which is relevant to AVH. In addition, impaired working memory performance was found associated with GM reduction and FA decrease in SZ in prefrontal and superior temporal area, in both discovery and replication datasets. In summary, by leveraging multiple imaging and clinical information into one framework to observe brain in multiple views, we can integrate multiple inferences about SZ from large-scale population and offer unique perspectives regarding the missing links between the brain function and structure that may not be achieved by separate unimodal analyses.